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In  this  thesis,  results  of  a  study  of  the  heuristic  random  search  optimization  method  called 
simulated  annealing  are  given.  Most  of  the  results  are  concerned  with  the  average  amount  of  time 
simulated  annealing  takes  to  find  an  acceptable  solution. 

We  analyzed  the  average  time  complexity  of  simulated  annealing  for  the  matching  problem. 
Although  the  matching  problem  has  worst-case  polynomial  time  complexity,  we  show  that  there  is 
a  sequence  of  graphs  where  the  average  time  complexity  of  a  "natural'*  version  of  simulated 
annealing  is  at  least  exponential.  In  contrast,  we  show  that  the  "natural"  version  of  simulated 
annealing  has  a  worst-case  polynomial  average  time  complexity  if  it  is  only  required  to  find  "near” 
maximum  matchings.  An  exponential  lower  bound  on  the  minimum  average  time  complexity  over 
a  wide  class  of  simulated  annealing  algorithms  when  our  attention  is  restricted  to  constant  tempera¬ 
ture  schedules  is  also  given. 

The  typical  case  for  simulated  annealing  for  the  matching  problem  is  also  analyzed.  Since  we 
were  not  able  to  discover  a  method  to  exactly  analyze  the  average  time  complexity  of  simulated 
annealing  for  the  matching  problem  for  "typical"  graphs,  we  used  approximations  to  estimate  the 
average  time  complexity  and  then  checked  the  accuracy  of  the  approximation  with  data  from  com¬ 
puter  simulations.  Our  results  indicate  that  if  we  only  consider  graphs  that  have  at  least  as  many 
edges  as  they  have  nodes  then  the  average  time  complexity  of  simulated  annealing  for  a  typical 
graph  with  n  nodes  is  0(n4). 

A  technique  for  producing  easy-to-analyze  annealing  processes,  called  the  template  method,  is 
given.  It  is  our  hope  that  this  method  will  produce  interesting  examples  of  simulated  annealing 


that  will  help  us  to  understand  the  heuristic.  We  provide  two  examples  of  using  the  template 
method  to  analyze  the  finite-time  behavior  of  simulated  annealing  as  a  function  of  the  temperature 
schedule.  A  generalization  of  simulated  annealing,  which  we  refer  to  as  the  threshold  random 
search  algorithm,  is  presented.  We  also  give  conditions  under  which  no  monotone  decreasing  tem¬ 
perature  schedule  is  optimal. 

Finally,  we  discuss  the  use  of  quadratic  penalty  methods  in  conjunction  with  simulated 
— aiing  to  solve  problems  with  equality  constraints.  An  experimental  evaluation  is  made 
between  adaptive  and  static  quadratic  penalty  methods,  and  it  is  shown  that  adaptive  quadratic 
penalty  methods  can  provide  low-valued  solutions  over  a  wider  range  of  penalty  parameter  values 
than  static  quadratic  penalty  methods. 
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CHAPTER  1 

INTRODUCTION 

Simulated  annealing  is  a  heuristic  random  search  technique,  introduced  independently  by 

V 

Kirkpatrick  et  al.  [1]  and  Cemy  [2],  for  finding  approximate  solutions  to  combinatorial  optimiza¬ 
tion  problems.  It  is  a  variation  of  the  local  improvement  technique  in  which  an  initial  solution  is 
repeatedly  improved  by  perturbing  it  until  it  reaches  a  "local  minimum,"  i.e.,  a  solution  where  no 
improvement  is  possible  by  perturbing  it.  A  drawback  of  the  local  improvement  method  is  that  the 
search  may  terminate  in  poor  local  minima.  Simulated  annealing  tries  to  avoid  getting  stuck  in 
poor  local  minima  by  randomly  accepting  some  perturbations  that  worsen  the  solution  as  well  as 
accepting  all  perturbations  that  improve  it. 

Simulated  annealing  has  many  of  the  attractions  of  the  local  improvement  method,  such  as 
the  relative  ease  of  implementation  on  new  problems  and  the  modest  amounts  of  memory  usually 
required  by  these  implementations.  Since  simulated  annealing  is  a  simple  heuristic  method,  it  has 
been  applied  to  solve  a  variety  of  problems,  such  as  generating  error-correcting  codes  [3],  restoring 
images  automatically  [4],  and  designing  VLSI  circuits  automatically  [1],  and  since  simulated 
annealing  typically  requires  only  a  modest  amount  of  memory,  it  has  usually  been  applied  to  solve 
problems  with  many  variables.  Empirical  results  show  that  simulated  annealing  will  usually  find  a 
better  solution  than  the  local  improvement  method,  but  at  a  cost  of  a  longer  run  time. 

With  these  properties,  simulated  annealing  has  been  and  will  be  applied  to  find  approximate 
solutions  to  many  useful  problems.  Hence,  it  is  important  to  analyze  the  performance  of  simulated 
annealing  and  identify  important  parameters  that  govern  that  performance.  For  the  remainder  of 
this  introduction  we  will  precisely  describe  a  simulated  annealing  algorithm  applied  to  solving  a 


generic  combinatorial  optimization  problem,  briefly  review  some  of  the  directions  of  past 
theoretical  research,  and  discuss  the  organization  of  this  thesis. 


Suppose  we  want  to  solve  the  generic  combinatorial  optimization  problem 

min{c(s):  seS), 

where  S  is  a  finite  set  and  c  is  a  cost  function  c:  S-»R.  In  addition,  suppose  we  have  a  transition 
probability  matrix  R  over  S  and  a  sequence  (Tk:  k  2  0)  (called  the  temperature  schedule  )  of  posi¬ 
tive  extended  real  valued  numbers.  Typically,  the  temperature  schedule  is  monotone  decreasing  to 
zero.  A  state  s  is  referred  to  as  a  neighbor  of  s'  if  R,-,  >  0.  A  simulated  annealing  algorithm 
applied  to  this  problem  constructs  a  sequence  (Xk:  k  >  0)  of  states  in  the  following  way.  An  initial 
state  Xq  is  chosen.  Given  Xk  =  s,  a  potential  next  state  Yk  is  chosen  with  probability  distribution 

P(Yk  =  s'|Xk  =  s]  =  R,,'. 

Then  we  set 


where 


Xk+1  = 


Yk  if  cCYk)  <;  c(s) 

«  Yk  with  probability  p^  if  cCYfc)  >  c(s) 
Xk  otherwise, 


Pit  =  exp 


-max{c(Yk)  -  c(s),0} 


This  specifies  how  the  sequence  (Xk:  k  S  0)  is  chosen.  The  random  process  (Xk:  k  £  0)  produced 
by  the  algorithm  is  a  discrete  time  Markov  chain,  and  we  will  call  it  the  annealing  process  on  sys¬ 
tem  (S,  c,  R)  and  with  temperature  schedule  (Tk:  k  £  0). 


For  some  specified  time  K  (which  is  possibly  a  random  time),  the  algorithm  returns  XK.  If 
the  amount  of  memory  permits,  the  algorithm  can  be  modified  to  return  the  lowest  valued  member 

of  (Xq,  X[ . XK}  rather  than  XK.  Note  that  if  the  temperature  schedule  is  identically  equal  to 

zero  then  the  simulated  annealing  algorithm  is  a  local  improvement  algorithm. 


We  now  briefly  review  some  directions  of  past  theoretic^  research  on  simulated  annealing. 
One  of  the  popular  directions  of  theoretical  research  on  simulated  annealing  is  to  determine  condi¬ 
tions  on  the  transition  probability  matrix  R  and  the  temperature  schedule  (Tk:  k  £  0)  so  that 

lim  P[Xk  €  S*]  =  1, 

where  S*  is  the  subset  of  states  of  S  that  have  minimal  cost.  The  results  of  this  type  lead  to 
insights  on  the  dynamics  of  the  annealing  process. 

Another  direction  of  theoretical  research  is  to  analyze  the  finite  time  behavior  of  the  anneal¬ 
ing  process.  We  will  give  short  descriptions  of  three  results  of  such  analysis.  The  first  result,  by 
Mitra  et  al.  [5],  is  an  upper  bound  on  the  distance  between  the  state  probability  vector  of  Xk  and  a 

probability  vector  (it,:  s  e  S)  such  that  £  =  1-  The  following  is  a  simple  corollary  to  their 

seS* 

result.  Suppose  the  temperature  schedule  (Tk:  k  £  0)  is  such  that  Tk  - - ^ - ,  ko  £  1, 

log(k  +  ko  +  1)  ^ 

and  y  >  rL,  where  r  is  the  radius  of  the  graph  underlying  the  annealing  process  (Xk:  k  £  0)  and  L 
is  a  Lipschitz-like  constant  of  the  cost  function.  Then,  for  a  large  number  of  iterations  k, 

£|P[Xk  =  s]-*s|  =  0(l/kmin(,’b>), 

seS 

where  a  and  b  respectively  increase  and  decrease  with  increasing  y. 

The  second  result,  by  Gelfand  and  Miner  [6],  is  a  lower  bound  for 
P[Xj  g  S*:  for  some  j  S  k].  If  the  temperature  schedule  (Tk:  k  >  0)  is  such  that  exp(-l/Tk)  =  k-1/r 
and  T  is  large  enough,  then  this  lower  bound  converges  exponentially  fast  to  zero.  However,  if  T 
is  sufficiently  small,  the  bound  converges  to  a  strictly  positive  number. 

The  third  result,  which  is  a  corollary  to  a  result  by  Lundy  and  Mees  [7],  is  an  upper  bound 
on  E[min(k:  Xk  g  S* } ],  and  is  derived  in  the  following  way.  Let  c*  =  min(c(s):  gS}.  Suppose 
for  some  positive  r  there  are  positive  scalars  a  and  y  such  that  for  all  j  £  0, 


4 


c(Xj+r)  -  c(Xj)  <  y 
and 


E[c(X^r)  -  c(Xj)  lc(Xj)  >  c*]  <  -cl 
A  trivial  extension  of  Wald’s  equation  [8]  yields 


E[min{k:  Xk  e  S‘}]  < 


c(Xq)  -  c*  +  ry 


a 


However,  Lundy  and  Mees  do  not  indicate  how  to  find  values  of  a  and  r. 


Besides  trivial  examples,  most  theoretical  results  on  the  finite-time  behavior  of  the  annealing 
process  have  been  derived  without  much  consideration  of  typical  applications  of  simulated  anneal¬ 
ing.  Therefore,  the  bounds  of  these  results  may  be  very  loose  if  they  are  directly  applied  to  a  par¬ 
ticular  application.  In  Chapters  2  and  3,  we  analyze  simulated  annealing  for  a  particular  nontrivial 
problem,  the  matching  problem.  Upper  and  lower  bounds  on  the  average  time  it  takes  a  simulated 
annealing  algorithm  to  find  a  solution  of  the  matching  problem  are  presented  in  Chapter  2.  These 
bounds  are  worst-case  bounds  over  all  instances  of  the  problem  of  a  specified  size.  In  Chapter  3, 
we  attempt  to  determine  the  average  length  of  time  a  simulated  annealing  algorithm  takes  to  find  a 
solution  to  the  matching  problem  for  a  "typical"  instance.  The  results  of  Chapter  3  arc  based  on 
approximations,  and  these  approximations  are  checked  for  accuracy  by  comparing  them  to  with 
data  from  computer  simulations. 

In  Chapter  4,  we  present  a  collection  of  results.  A  simple  technique  to  cook  up  easy-to- 
analyze  annealing  processes  is  given  in  Section  4.2.  It  is  our  hope  that  interesting  annealing 
processes  that  will  help  us  to  better  understand  simulated  annealing  will  be  produced  by  this 
method.  In  Section  4.3,  we  present  a  random  search  heuristic,  called  the  threshold  search  algo¬ 
rithm,  that  is  a  generalization  of  simulated  annealing.  In  Section  4.4,  conditions  are  given  that 
insure  that  no  monotone  decreasing  temperature  schedules  are  optimal. 


In  Chapter  5,  we  consider  using  simulated  annealing  to  solve  problems  that  have  equality 
constraints.  A  technique  used  to  solve  problems  with  equality  constraints  is  the  quadratic  penalty 
method,  which  transforms  an  equality  constrained  problem  into  an  unconstrained  problem.  Simu¬ 
lated  annealing  can  then  be  used  to  solve  the  transformed  problem.  The  method  of  multipliers  is 
aii  adaptive  quadratic  penalty  method,  and  through  experiments  we  compare  this  method,  and  a 
variation  of  this  method,  with  the  quadratic  penalty  method. 

Finally,  we  summarize  the  results  of  this  thesis  and  provide  possible  directions  for  future 


CHAPTER  2 


MATCHING  PROBLEM:  AVERAGE  PERFORMANCE  FOR  WORST-CASE  GRAPHS 

2.1.  Introduction 

The  introduction  of  this  chapter  is  divided  into  four  subsections.  The  motivation  of  the 
results  of  the  chapter  is  provided  in  Subsection  2.1.1.  Subsection  2.1.2  contains  the  basic  simu¬ 
lated  annealing  algorithm  for  the  matching  problem  that  is  analyzed  in  Subsection  2.1.3,  Sections 
2.2  and  2.3.  In  Subsection  2.1.3,  a  convergence  in  probability  result  of  the  basic  simulated  anneal¬ 
ing  algorithm  in  Subsection  2.1.2  is  given.  Finally,  an  organization  of  the  rest  of  the  chapter  is 
presented  in  Subsection  2.1.4. 

2.1.1.  Motivation 

In  this  chapter,  we  consider  simulated  annealing  applied  to  maximum  matching,  a  fundamen¬ 
tal  problem  in  combinatorial  optimization.  An  instance  of  the  maximum  matching  problem  is  a 
simple  graph  G  =  (V.E),  where  V  denotes  the  set  of  nodes  of  G  and  E  denotes  the  set  of 
(undirected)  edges  of  G.  A  matching  M  in  G  is  a  subset  of  E  such  that  no  two  edges  in  M  share  a 
node.  The  maximum  matching  problem,  for  instance  G,  is  to  find  a  matching  in  G  with  maximum 
cardinality. 

The  maximum  matching  problem  is  easy  in  the  sense  that  there  is  a  known  deterministic 
algorithm  which  solves  the  problem  in  0(VJvJ|E|)  steps  (see  [9]),  where  |V|  is  the  cardinality  of 
V.  However,  we  do  not  consider  maximum  matching  to  be  trivial,  since  the  deterministic  algo¬ 


rithm  is  somewhat  subtle. 


2.12.  The  basic  simulated  annealing  algorithm  for  maximum  matching 

We  will  here  describe  what  is  perhaps  the  most  obvious  way  to  apply  simulated  annealing  to 
search  for  the  maximum  matching  of  a  graph  G  -  (VJE).  Let  Tt,  T2. .  . .  be  a  nonincreasing  tem¬ 
perature  schedule.  We  say  that  an  edge  e  is  matchable  relative  to  a  matching  M  if  e  4  M  and  if 
M+e  is  a  matching  (here  M+e  is  our  notation  for  Mu  (e),  which  we  use  only  if  e  4  M).  Let 
Q(M)  denote  the  set  of  matchable  edges  relative  to  matching  M. 

To  begin  the  algorithm,  choose  an  arbitrary  matching  Xq  in  G  --  for  example.  X0  could  be 

the  empty  set  0.  Having  selected  Xq,  X! . Xk,  choose  X*+i  as  follows.  Choose  an  edge  e  at 

random,  all  edges  in  E  being  equally  likely. 

If  e  is  matchable  relative  to  Xk  let  Xk+1  =  Xk  +  e. 

Xk  -  e  with  probability  expf-l/TJ 

If  e  e  Xk,  let  Xk>l  =  < 

Xk  with  probability  1  -  exp(-l/Tk). 

* 

Else,  let  Xk+i  =  Xk. 

Note  that  (Xk:  k  2  0)  is  an  annealing  process  on  system  (S,  c,  R),  where  S  is  the  set  of  all  match¬ 
ings,  c(s)  is  the  negative  of  the  cardinality  of  s,  and  R  is  a  transition  probability  matrix  over  S  such 
that 

1 

|E|  i  #  j  and  |i®jl  *  1 
Rij  -  '0  |i®j|  >  l 

l-IR*  •  *  j- 

km 

k 

2.1  J.  Convergence  in  probability 

We  begin  by  giving  some  standard  notation  [10].  Given  a  matching  M  in  G.  a  node  v  is 
exposed  if  no  edge  in  M  is  incident  to  v.  A  path  p  in  G  is  a  sequence  of  nodes  p  = 
[v,,v2 . vk],  where  k  2  1,  the  nodes  v,.v2 . vk  are  distinct,  and  [v,,v^|]  e  E  for 


1  S  i  S  k-1.  The  length  of  such  a  path  is  k-1.  The  path  is  augmenting  for  M  if  its  length  is  odd 
(so  k  is  even),  if  v,  and  vk  are  exposed,  and  if  (v^v^.,]  €  M  for  even  values  of  i  with  2  S  i  £  k-2. 
A  well-known  result  of  Berge  and  Norman  and  Rabtn  is  that  a  matching  M  does  not  have  max¬ 
imum  cardinality  if  and  only  if  there  exists  an  augmenting  path  for  M  [10.  Theorem  10.1]. 

Let  Mo  he  a  matching  which  does  not  have  maximum  cardinality,  and  let  [v,.  v2,  .  .  ,  vk)  be 
an  augmenting  path  for  M0.  Starting  from  it  is  possible  for  the  basic  simulated  annealing 
algorithm  to  reach  a  higher  cardinality  matching  by  passing  through  the  sequence  of  matchings 
M)  M;  ...  Mk_,  given  by 

M,  =  Mo  -  [v2,  v3]  M2  «  M,  +  [v,.vj] 

M3  a  M2  -  [V4,  Vj]  M4  =  M3  +  [v3.v4] 


Mk_3  =  Mk_4  —  (vk_2,vk_|j 


Mk„2  *  Mk_3  +  [vk_3.vk_2] 


and  fmallv 


Mk_,  =  Mk_2  +  [vk.vk_,]. 

The  matchings  in  the  sequence  have  cardinality  at  least  as  large  as  ti*  cardinality  of  Mo  minus 
one.  In  the  terminology  of  [11],  the  depths  of  the  local  maxima  for  the  matching  problem  are  at 
most  one.  The  following  theorem  is  thus  an  immediate  consequence  of  [11,  Theorem  1).  A 
matching  M  is  said  to  be  maximal  if  no  edge  is  matchable  relative  to  M.  Let  S*  denote  the  set  of 
matchings  with  maximum  cardinality. 

Theorem  2  11  Let  G  *  (V.E)  be  a  graph  with  a  nonempty  set  of  edges  E.  If  all  maximal 
matchings  of  G  are  in  S*  then 


Jim  P[Xk  e  S*]  ==  1  if  and  only  if  lim  exp(-l/Tk)  »  0 


If  some  maximal  matching  is  not  in  S*  then 


Theorem  2.1.1  gives  a  large-time  asymptotic  result  for  each  fixed  instance  G,  and  the  condi¬ 
tions  do  not  depend  on  the  size  of  G.  In  contrast,  our  goal  in  this  paper  is  to  give  asymptotic 
results  as  |V|  tends  to  infinity.  Interesting,  general  work  on  the  analysis  of  simulated  annealing 
run  for  a  finite  number  of  iterations  has  appeared  (see  [51,  [12],  and  [6],  for  example).  However, 
the  general  theory  does  not  determine,  for  example,  whether  simulated  annealing  exactly  (or 
nearly)  solves  the  maximum  matching  problem  in  an  amount  of  time  growing  as  a  polynomial  in 
|  V  | .  Moreover,  it  is  not  clear  yet  that  any  general  theory  could  answer  such  questions.  In  this 
chapter,  we  present  results  that  we  would  like  to  see  established  more  generally. 

2.1.4.  Organization  of  the  chapter 

In  Section  2  2.  we  show  that  for  a  certain  family  of  graphs  the  basic  annealing  algorithm,  or 
any  other  algorithm  in  a  fairly  large  related  class,  cannot  find  maximum  cardinality  matchings 
using  average  time  that  is  upper  bounded  by  a  polynomial  in  |V|.  In  contrast,  we  show,  in  Sec- 
uon  2.3.  that  a  degenerate  form  of  the  basic  simulated  annealing  algorithm  (obtained  by  letting  Tk 
be  a  suitably  chosen  constant,  independent  of  k)  produces  matchings  with  nearly  maximum  cardi¬ 
nality  using  average  time  that  is  upper  bounded  by  a  polynomial  in  |V|.  Sections  2.2  and  2.3  can 
ne  read  independently.  In  Section  2.4,  we  present  a  lower  bound  on  the  average  time  simulated 
annealing  takes  to  find  nearly  maximum  cardinality  matchings  when  the  temperature  schedule  is 
restneted  to  be  of  constant  value. 

2.2.  The  Impossibility  of  Maximum  Matching  in  Polynomial  Average  Time  Using  Certain 

Simulated  Annealing  Type  Algorithms 

Certain  local  search  algorithms  for  the  maximum  matching  problem  will  be  considered  in  this 
section  The  algorithms  will  not  be  restneted  much  in  an  attempt  to  include  several  implement 
uons  of  simulated  annealing  Both  the  basic  simulated  annealing  algonthm  (given  in  Subsecuon 


2.1.2).  when  Xq  =  0,  and  a  particular  multistan-descent  algorithm  will  be  included  Nevertheless, 
it  will  be  proved  that  the  algorithms  cannot  reach  a  maximum  matching  in  average  time  bounded 
by  a  polynomial  in  |V|,  for  a  particular  family  of  graphs. 


First,  we  allow  the  "temperature”  to  depend  on  both  time  and  the  current  and  past  states  of 
the  algorithm.  Second,  we  assume  that  the  type  of  each  move  can  be  specified  from  among  the 
three  possibilities  whenever  they  exist:  addition  of  an  edge,  deletion  of  an  edge,  no  change.  The 
key  restriction  we  do  impose  is  that  given  the  type  of  a  move,  the  location  of  the  edge  to  be  added 
or  deleted  is  uniformly  distributed  over  the  possible  locations. 

We  thus  view  the  sequence  Xo.  X,....  of  stales  generated  by  the  algorithm  as  a  controlled 
Markov  process.  Suppose  that  "controls"  a,  and  d,  are  given  such  that  for  each  u 

C.l  a,.  d„  a,  +  d,  e  [0.1]  with  probability  one.  and  a,  and  d,  are  functions  of  (Xq.  ..  X,). 


C  2.  P[X,„|  =  M'  |Xt  =  M,  X,_j . XoJ 


I  -  «»  -  d, 

if  M  «  M' 

d,/|M| 

if  e  e  M  and  M'  *  M— e 

VlQ(M)| 

if  e  e  Q(M)  and  M'  =  M+e 

0 

if  |M9M'|  2  2. 

where  M®M'  denotes  the  symmetric  difference  of  M  and  M'  (recall  that  Q(M)  is  the  set  of  edges 
matchable  relative  to  M). 

Geariy,  if  we  choose  the  controls  appropriately  we  can  use  this  controlled  Markov  process  to 
mimic  the  basic  simulated  annealing  process  of  Subsection  2.1.2.  We  can  also  control  the  Markov 
process  to  mimic  a  mullistart -descent  algorithm  (although  only  at  half  speed).  To  do  this  we 
assume  that  X0  *  0  We  then  let  a,  =  1  for  0  S  t  <  S, .  where  S,  is  the  first  ume  that  a  maximal 
matching  is  reached  Then  we  let  d,  =  1  for  S,  S  t  <  2S,.  which  guarantees  that  X,  =  O  for 
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i  =  2S  i .  We  then  keep  repeating  this  process. 


The  family  of  graphs  we  will  focus  on  is  {Gj.  G2,  G3.  ),  where  Ga  *  (V.E). 


V  »  {utJ:  1  £  i.  j  £  n+l)u{vg:  1  £  i.  j  S  n+1). 

*♦1  ■ 

E  «  HuB.  H  »  u  H..  B  »  u  B„ 
rt 

Hj  *  llu.j.v.j]:  IS  i  £  n+1 }  for  j  such  that  1  £  j  £  n+1, 


B,  =  { lv,j.ukr>.| ):  1  £  i.  k  £  n+1 }  for  j  such  that  1  £  j  £  n 

Graph  Gn  is  a  bipartite  graph  with  2(n+I)2  nodes  and  (n+l)3  edges.  For  each  j.  the  subgraph  of  Ga 
induced  by  the  nodes  of  edges  of  Bj  is  a  complete  bipartite  graph,  and  the  subgraph  of  Ga  induced 
by  consists  of  n+1  disjoint  edges.  The  set  of  edges  H  is  a  matching,  and  it  is  maximum  since  it 
leaves  no  nodes  exposed.  In  addition,  there  are  no  other  maximum  matchings  since,  by  induction, 
any  matching  which  has  no  nodes  exposed  must  include  the  edges  in  H,.  H2,  .  .  .  .  H,*.,.  As  an 
example.  G3  is  sketched  in  Figure  2.2.1. 

The  main  result  of  this  section  is  the  following  theorem. 

Theorem  2  2  /  There  exist  positive  constants  o,  and  o2  such  tb...  the  following  is  true.  For 
any  n  2:  1.  let  (X.a.d)  be  a  controlled  process  for  finding  the  maximum  matching  of  Gn  satisfying 
conditions  C  l  and  C.2  Define  R’  by 

R*  =  min(k:  Xk  is  a  maximum  matching) 


E(R*  |Xo  =  0j  2  0]Cxp(o;n). 


In  the  proof  of  Theorem  2.2.1,  a  function  g(M)  is  used  to  measure  the  distance  from  a  match 
ing  M  to  the  unique  maximum  matching  H.  We  will  present  the  funcuon  g  after  defining  some  of 
its  components  Let  V^M)  »  *  0  For  all  j,  such  that  1  £  j  S  n+l.  let  IJ(M)  iresp. 


Figure  2.2.1.  Sketch  of  G? 
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Vj(M))  be  the  number  of  nodes  in  (u^:  1  S  i  £  n+1)  (resp.,  (v^:  1  £  i  £  n+1})  that  arc  exposed 
relative  to  M.  The  function  g  is  such  that,  for  a  matching  M, 

g(M)  =  c|BnM|  +  £  ntfV/MXU^CM)). 

H 

where  c  =  18, 

\y(x,y)  =  2  min{x,y)  +  I(x>0r*o^). 

and  I,  !  is  the  indicator  function.  Note  that  g(M)  includes  the  term  |BnM|  and  a  second  term 
which  is  related  to  the  set  of  edges  in  B  that  are  matchable  relative  to  M.  Some  trivial  properties 
of  g  are  that  g  is  nonnegative.  g(H)  =  0,  and  g(0)  *  2n(n+l). 

The  next  set  of  lemmas  and  definitions  is  used  to  show  that  (g(Xk):  k  2  0)  tends  to  drift  away 
from  zero  (and  hence  (Xk:  k  £  0)  drifts  away  from  H)  when  (gCXJ:  k  £  0)  is  below  a  certain 
threshold  (see  Equation  (2.2.3),  p.  18).  After  the  lemmas  and  definitions  the  proof  of  Theorem 
2.2  1  is  presented. 

Lemma  2  2.1  Suppose  x.y  2  0.  Then 

€  (1,2,3)  if  y  i  max(x,l) 

(a)  v(x+l,  y)  -  y(x,y)  • 

=  0  otherwise, 

k 

6  (1,2,3)  if  x  2  max(y,l ) 

(b)  y(x,  y+1)  -  y(x,y)  < 

*  0;  otherwise, 

(c)  \y(x+l,  y+1)  -  y(x.y)  e  (2,3). 

Proof  Easy  by  inspection  of  Table  2.2. 1 . 

□ 


.•a 


■id 


. ) 


<v 

.'*1 

w 

■Si 


Table  12.1.  Values  of  y(x,y). 


0  1  2  3  4  5  6 


0 

0 

0 

0 

0 

0 

2 

3 

3 

3 

3 

3 

3 

4 

5 

5 

5 

5 

3 

5 

6 

7 

7 

7 

3 

5 

7 

8 

9 

9 

3 

5 

7 

9 

10 

11 

3 

5 

7 

9 

11 

12 

Lemma  2.2.2:  Let  M  be  a  matching  of  G„. 

(a)  Suppose  e  e  MnHr  Then  g(M-e)  *  g(M)  >  0  if  and  only  if 

Vrl(M)  i  maxlUjfM),!}  or  U^(M)  £  maxfV/M),!}. 
tt»  g(M-e)  -  g(M)  e  (0,  1 . 6}  for  e  e  M  n  H. 

(c)  g(M-e)  -  g(M)  €  {-c+2,  -c+3}  for  e  e  M  n  B. 

Proof  It  is  easy  to  see  that,  for  e  €  M  n  H,, 

g(M-e)  -  g(M)  =  U/M)  +  1)  -  v(VH(M),  U/M)) 

♦  V(V/M)  +  1,  U^,(M))  -  VfV/M).  U^,(M)). 

and,  for  e  €  M  n  B; 


.r'V’ 


•VI 

•V 


*'*i 

’Hi 

£ 


g(M-e)  -  g(M)  =  -  c  +  v(Vj(M)  +  1,  Uhl(M)  +  1)  -  v(Vj(M),  Uj*,(M)). 
Note  that  Lemma  2.2.2  can  be  easily  deduced  from  these  equations  and  Lemma  2.2.1. 


For  each  matching  M  of  Gn,  define 

A(M)  =  {e  is  matchable  relative  to  M  and  g(M+e)  *  g(M)), 

D(M)  =  {e  e  M:  g(M-e)  *  g(M)}, 

A+(M)  =  {e  €  A(M):  g(M+e)  >  g(M)}.  A_(M)  =  {e  e  A(M):  g(M+e)  <  g(M)}, 


D+(M)  =  {e  e  D(M):  g(M-e)  >  g(M)}. 


D_(M)  =  {e  €  D(M):  g(M-e)  <  g(M)}. 


Lemma  2.23:  Let  M  be  a  matching  of  G„  and  let  0<5<1.  Then 

(a)  D+(M)  c  MnH,  D_(M)  =  MnB,  A_(M)  c  H-M,  A+(M)  =  BnQ(M), 


(b)  |A_(M)|  £  2|A+(M)(, 


(c)  |D_(M)|£n5  if  g(M)  <  nc5. 


(d)  |D+(M)|  S  n(l  -  5(-|  +  1))  if  0  <  g(M)  <  ncS. 


Proof  Part  (a)  is  a  consequence  of  parts  (b)  and  (c)  of  Lemma  2.2.2  and  the  fact  that  c  >  3. 

We  will  now  prove  the  following  two  facts,  which  imply  part  (b):  every  edge  in  A_(M)  has  a 
node  in  common  with  an  edge  in  A+(M)  and  for  every  edge  in  A+(M)  there  are  at  most  two  edges 
in  A_(M)  that  have  nodes  in  common  with  it.  Let  e  e  A_(M).  Then  e  e  H,  for  some  j  by  part 
(a),  and  moreover  at  least  one  of  Vr,(M)  or  Un!(M)  is  strictly  positive  by  pan  (a)  of  Lemma 
2.2.2.  Thus,  there  is  at  least  one  edge  e'  in  Bj_,uBj  which  is  matchable  relative  to  M  and  has  a 
node  in  common  with  e.  Then  e'  is  in  A+(M)  and  hence  we  can  conclude  that  every  edge  in 
A_(M)  has  a  node  in  common  with  an  edge  in  A+fM).  On  the  other  hand,  part  (a)  implies 


A_(M)  c  H,  A+(M)  c  B,  and  therefore  for  every  edge  in  A+(M)  there  are  at  most  two  edges  in 
A_(M)  that  have  nodes  in  common  with  it.  Part  (b)  is  proved. 

By  part  (a) 

|D_(M)|  £  |BnM|  £  g(M)/c  <  n5. 

which  proves  part  (c). 

We  now  prove  part  (d),  which  will  complete  the  proof  of  Lemma  2.2.3.  Let  M  be  a  match¬ 
ing  with  0  <  g(M)  <  nc5.  The  fact  that  g(M)  >  0  implies  that  M  is  not  equal  to  the  unique  max¬ 
imum  matching  H,  which  in  turn  implies  that  there  exists  at  least  one  exposed  node.  Since  g(M)  < 
nc,  M  contains  fewer  than  n  edges  from  B1uB2u  •  •  •  uBn.  Hence  MnBk  =  0  for  some  k.  Now, 
the  set  of  nodes 

Z  =  {vij:  1  £  i  £  n+1,  1  £  j  £  k}  u  {u^:  1  £  i  £  n+1,  k+1  £  j  £  n+1 } 

contains  exactly  half  of  the  nodes  of  the  graph.  Since  MnBk  =  0,  each  edge  in  M  is  incident  to  a 
node  in  Z  and  a  node  not  in  Z.  Thus,  Z  contains  half,  and  therefore  at  least  one,  of  the  exposed 
nodes,  so  at  least  one  of  the  2n  numbers 

V,(M) . Vn(M),  U2(M) . U^,(M) 

is  nonzero.  By  the  symmetry  between  the  Uj's  and  Vj's,  we  can  restrict  attention  to  the  case  that 
for  some  j  with  1  £  j  £  n,  Uj+1(M)  is  as  least  as  large  as  any  of  the  other  2n-l  numbers.  Then 
U,+i(M)  £  max{  VjCND.l }  so  Mr\Hj  c  D+(M)  by  part  (a)  of  Lemma  2.2.2.  Hence 

|D+(M)|  £  |MnHj|  =  rn-1  -  V;(M)  -  [MnBj| 

=  n+1  -  minfV/M),  U^ifM)}  -  |Mr\Bj| 

£  n  -  g(M)/2  -  g(M)/c  £  n(l  -  8(-j  +  1», 


which  proves  part  (d). 


Lemma  2.2.4:  Now  set  5  =  1/43.  If  M  is  a  matching  of  Gn  such  that  0  <  g(M)  <  nco,  then 


|A(M)  I"1  £  [g(M+e)  -  g(M)]  £  1  if  A(M)  *  0 

e€  A(M) 


(2.2.1) 


and 


|D(M)  I'1  £  [g(M-e)  -  g(M>]  ^  -J-  if  D(M)  *  0. 

e«D(M)  2 


(2.2.2) 


Proof:  By  part  (a)  of  Lemma  2.2.3  and  parts  (b)  and  (c)  of  Lemma  2.2.2,  we  have 

c  -  3  e  €  A+(M) 


g(M+e)  -  g(M)  5  < 


-  6  if  e  e  A_(M), 


which,  together  with  part  (b)  of  Lemma  2.2.3,  yields 


|A(M)  I"1  £  [g(M+e)  -  g(M)]  2>  |A(M) |-,[(c-3) K(M)|  -  6 |A_(M) |]  I>  1 

ee  A(M) 

Similarly,  by  part  (a)  of  Lemma  2.2.3  and  parts  (b)  and  (c)  of  Lemma  2.2.2,  we  have 


g(M-e)  -  g(M) 


2:  1 


if  e  e  D+(M) 


£  -  c  +  2  if  e  e  D_(M), 


which,  together  with  parts  (c)  and  (d)  of  Lemma  2.2.3,  yields  (2.2.2). 

□ 

Proof  of  Theorem  2.2.1:  Let  t(0)  =  0,  and,  for  all  k  >  0,  let  t(k)  =  min  {t  > 
t(k-l):  g(XJ  *■  gCX^ic.!))}.  We  can  and  do  assume  that  P[R*  <  +°°|X0  =  0]  =  1.  It  follows  that, 
with  probability  one,  R*e  (t(l),  t(2),  .  .  .},  which  implies  that 

P[t(k+1)  <  oo|R*  >  t(k),  X0  =  0]  =  1. 

Given  a  matching  M,  define  s(M,l)  and  s(M,-l)  to  be  the  normalized  sums  appearing  in  Ine¬ 
qualities  (2.2.1)  and  (2.2.2)  respectively,  whenever  they  are  well  defined.  By  Lemma  2.2.4,  s(M,9) 
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>  1/2  if  0  <  g(M)  <  ncS  and  if  s(M,0)  is  well  defined. 

Let  ©k  =  1  if  the  jump  at  time  t(k+l)  is  caused  by  the  addition  of  an  edge,  and  let  ©k  =  -1  if 
the  jump  at  time  tk+1  is  caused  by  the  deletion  of  an  edge. 

Suppose  M  is  a  matching  with  0  <  g(M)  <  ncS.  Then 

E[g(Xt{k+i))  -  =  M,  0k  =  0,  Xl(k+1)_2,  .  .  .  ,X0]  =  s(M,0)  £  I 

Averaging  over  appropriate  values  of  ©j.  and  (X^  t(k)  <  i  <  t(k+l)),  it  follows  that 

E[g(Xi(k+i))  -  g(XI(k))  -  -j  |  g(X1(k))  <  ncS,  R*  >  t(k),  Xo,  .  .  .  ,Xt(k)]  ^  0.  (2.2.3) 

Also,  by  parts  (b)  and  (c)  of  Lemma  2.2.2,  the  magnitudes  of  the  increments  of  g(X,)  are  bounded 
by  c  -  2.  Thus,  Theorem  2.3  of  [13]  is  in  force  if  we  define  (Y,  £q,  a*,  b*)  by 

Yk  =  -g(Xl(k)),  £o  =  -j,  a’  =  -n5c,  and  b*  =  0.  (2.2.4) 

Using  the  fact  that  Y0  =  -g(0)  =  -2n(n+l)  <  a’,  this  produces  constants  rj  >  0,  p  e  (0,1)  and 
D*  >  0  such  that 

P[g(Xt(k))  =  0,  R*  >  t(k— 1)  |X0  =  0]  <  u,  (2.2.5) 

where  u  =  D*exp(-Tinc5)/(l-p).  The  term  P[R*  =  t(k)[X0  =  0]  is  less  than  or  equal  to  the  left 
side  of  Inequality  (2.2.5),  because  if  Xl(k)  is  the  maximum  matching  then  g(Xl(k))  is  equal  to  zero. 

Therefore,  P[R*  =  t(k)jX0  =  0]  £  u.  Since  R*  e  [t(l),  t(2), . .  .}  and,  for  all  k,  t(k)  £  k,  we  have 

k 

P[R*  >  kjX0  =  0]  >  P[R*  >  t(k)JX0  =  0]  =  1-X  P[R*  =  t(j)|X0  =  0]  >  max{0.  1-ku). 

r  i 

Hence, 


.-V 


/  .■  /  , 
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E[R* |X0  =  0]  =  £  P[R*>k|Xo  =  0]2t  £  max{0,  1-ku)  £ 

k-0  W>  2u 

Thus,  taking  Oj  =  (l-p)/2D*  and  o2  =  tjc8,  Theorem  2.2.1  is  proved. 

□ 

Remark:  Some  extra  work  shows  that  Conditions  D.l  and  D.2  of  [13]  are  satisfied  for  Y,  a* 
and  b*  given  in  (2.2.4)  and  r|  =  .0033683,  p  =  .9998,  a*  =  -nSc,  b*=0  and  D*  =  1.  This  shows 
that  Theorem  2.2.1  above  is  true  for  a \  =  .0001  and  o2  =  .0014. 


2.3.  Near  Maximum  Matching  in  Polynomial  Time 

Let  d*  denote  the  maximum  node  degree  of  the  graph  G  and  let  m*  denote  the  maximum  of 
the  cardinalities  of  matchings  in  G.  The  next  theorem  is  the  main  result  of  this  section. 

Theorem  2.3.1:  Let  P>  1.  Consider  a  run  of  the  basic  simulated  annealing  algorithm  (of 
Subsection  2.1.2)  with  Tk  =  T  for  all  k,  where  exp(-l/T)  =  X,  and  X  is  given  by 


X  =  — - —  and  to*  =  p  |  V  |(2d*)M. 

3|V|<o* 


Let  R  denote  the  random  time  R  =  min{k:  |Xk(  S  m*  (1--^)}.  Then  ER  $  24p2|V|5(2d*)2p"2. 


Remarks:  (1)  If  P  and  d*  are  bounded  as  |V|  — >  «,  then  ER  =  0(|V|5),  In  the  proof  below 
we  see  that  three  of  these  five  factors  of  J  V  J  arise  from  our  upper-bound  Dj  on  the  mean  time  the 
algorithm  takes  to  make  a  single  move.  A  smaller  average  run  time  can  be  achieved  by  using  an 
efficient  implementation  of  an  algorithm  that  simulates  XJ(i),  Xjm,  ....  where  J(k)  is  the  time 
process  (Xk:  k  >  0)  makes  its  kth  move  (see  [14]). 

(2)  Since  2d*  5  2|V|,  we  have  with  no  restriction  on  d*  that  ER  <,  6P222^|V|3+2(J.  Also  note 
that  if  P  >  m*  then  XR  is  a  maximum  matching. 
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t 

i 

(3)  We  will  briefly  comment  on  our  choice  of  constant  T  (equivalently,  on  our  choice  of  tem¬ 
perature).  It  must  be  large  enough  so  that  the  process  (Xk:  k  2  0)  "jumps  sufficiently  often", 
which  is  reflected  in  the  bounds  given  in  Lemmas  2.3.1  and  2.3.2a  below.  On  the  other  hand,  T 
!  must  be  small  enough  so  that  there  is  a-  net  drift  towards  larger  matchings,  enabling  us  to  obtain 

I  the  bound  of  Lemma  2.3.3  below. 

We  chose  Tk  to  be  independent  of  k,  though  we  can  see  some  motivation  for  letting  it 
decrease  as  k  increases.  More  precisely,  it  is  clear  that  an  improved  algorithm  can  be  obtained  by 
lening  Tk  be  a  decreasing  function  of  jXk|.  For  example,  it  is  shown  in  the  proof  of  Claim  2.3.1 
below  that  a  matching  M  in  G  has  an  augmenting  path  of  length  at  most  1  +  2|M|/(m*-|M|). 
This  bound  increases  sharply  as  |M|  approaches  the  final  value  of  |Xk|,  which  is  m*(l-l/p),  and 
in  the  proof  we  replace  |Mj  in  the  bound  by  this  final  value.  However,  working  with  the  |M|- 
I  dependent  bound  shows  that  a  larger  value  of  T  can  be  used  when  |Xk|  is  small  so  that  the  algo¬ 

rithm  "jumps  more  often,"  while  maintaining  a  sufficient  drift  towards  larger  matchings. 

|  We  chose  Tk  to  be  independent  of  k  primarily  for  two  reasons:  (1)  we  wanted  to  demon- 

!  strate  that  Tk  can  be  chosen  independently  of  the  algorithm  state  ("open-loop"  in  control-theoretic 

l 

terms)  and  (2)  we  do  not  think  the  complexity  bounds  can  be  improved  much  by  letting  Tk  be 
|  either  a  function  of  |Xk|  or  a  decreasing  function  of  k,  because  our  choice  of  Tk  is  tuned  to  the 

situation  when  |Xk|  is  close  to  its  target  value  m*(l— 1/(3),  and  this  situation  is  the  most  time  con¬ 
suming  for  the  algorithm  anyway. 


Finally,  we  think  it  is  significant  that  we  need  to  decrease  T  as  the  problem  size  increases.  It 
suggests  that  if  the  sequence  TJt  T2,  .  .  .  is  to  be  chosen  independently  of  the  graph,  it  should  be 


decreasing. 
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Proof:  Define  a  random  process  (Yk:  k  £  0)  by 


Yj  =  Xj 


i  *  0,1,2,. 


where  J(0)  =  0  and,  for  i  t  0, 


and  define 


J(i+1)  =  min{k  >  J(i):  Xk  *  Xk_,}, 


R(Y)  =  min{i:  |YJ  2>  m*  (1  -  ±)). 

P 


Note  that  lY^i  |  -  |Y,|  e  {-1,1 }  with  probability  one  for  each  i. 


Next,  define  a  random  process  (2*:  k  £  0)  by 


Zi  *  Ysn)  »  =  0.1. 


where  S(0)  =  0,  S(l)  =  1  and,  for  i  et  1, 


and  define 


S(i+1)  =  min{j:  j  >  S(i),  |Yj|  -  (Y^l  e  {-2,  2}}. 


R(Z)  =  min(i:  |2^|  2  m’(l  -  i-))- 

P 


Define  constants  Dj,  D2  and  D3  by 


D,  =  6|V|2a>‘, 


D2  =  2  co*. 


D3  =  2|V|. 


Lemma  2.3.1: 


E[J(i+l)  -  J(i)|J(i),  (X0,  X, . Xj(i))]  S  D,. 


Lemma  2.3.2a: 


E[(S(i+l)  -  S(i))I(i  <  R(Z))  |S(i),  (Y0,  Y! . YS{i))]  S  D2. 


Lemma  2.3.3: 


ER(Z)  <  D3. 

We  will  next  prove  Theorem  2.3.1,  assuming  the  lemmas  are  true.  We  have 


2 2 


ER  =  EJ(R(Y))  -  e£  (J(i+1)  -  J(i))I,l<Rroi  =  £  E[J(i+l)  -  J(i)|i  <  R(Y))P(i  <  R(Y)]. 

MS  M3 

Now  the  outcome  of  the  event  (i  <  R(Y)}  is  determined  by  (J(i),  (Xq,  .  .  .  so  we  can  apply 

Lemma  2.3.1  to  get 

ER  £  £  DjPfi  <  RfY)J  =  D|ER(Y). 

M3 


Similarly,  the  fact  R(Y)  *  S(R(Z))  and  Lemma  2.3.2a  imply  that  ER(Y)  £  D2ER(Z).  We  conclude 
that  ER  £  D1D2D3  from  ER  £  D,ERY.  Lemma  2.3.3,  ER(Y)  £  E^ERfZ),  and  Lemma  2.3.3.  This 
will  establish  the  theorem  once  we  prove  the  three  lemmas  above. 

□ 

Proof  of  Lemma  2.3.1:  By  the  strong  Markov  property  of  (Xk:  k  £  0), 


E[J(i+l)  -  J(i)|J(i),  (Xq.  X, _ Xj(i))]  =  E[J(i+l)  -  J(i)|YJ. 

Since  A.k  =  A.  for  all  k,  the  transition  probabilities  of  X  are  time  invariant,  so 

E[J(i+l)  -  J(i)|Y,  =  M]  =  P[Xk+1  *  Xk|Xk  =  M]"1. 

Now,  fix  a  matching  M.  One  of  two  cases  is  true: 

Case  1  Some  edge  in  E  is  matchable  relative  to  M.  Then 

P[Xk+1  *  Xk|Xk  =  M]  2 

|E| 

Case  2:  No  two  of  the  |V|  -  2|M|  exposed  nodes  are  connected  by  an  edge  in  the  graph. 

Then 


Pt*k+1  *  Xk|Xk  =  M]  =  *  _A_. 

|E|  2 1 V  | 

Hence.  in  either  case,  E(J(i+l)  -  J(i)|Y,  =  M]  £  max{  |Ej,  2jV|/X)  *  Dj. 


Proof  of  Lemma  2.3  2a  Lemma  2.3.2a  is  trivial  for  i  =  0,  so  we  fix  i  with  i  2  1.  Let  m  be 
an  integer  with  1  Sm<  m*(l-l/p).  Define  a  set  of  matchings  B  by 

B  =  {M:  |M|  =  m-1  or  |M|  =  m}  and  let  M0  be  a  fixed  matching  in  B.  Consider  the  event 
F  =  (Yso)  =  YS(i>-i  e  BandR(Z)>i}.  The  outcome  of  F  is  determined  by 
(S(i),  (Y0.  Yi.  •  Ys(i>))'  and  the  union  of  events  of  the  form  of  F  as  M„  and  m  vary  as  above  is 
equal  to  the  event  {R(Z)  >  i).  Hence,  it  suffices  to  prove  that 

E[S(i+l)  -  S(i)|S(i).  (Yq.  Y, . Y^JIp  £  D2 

for  arbitrary  fixed  values  of  m  and  M0  as  above.  Without  loss  of  generality,  we  assume  that  if 
|M0|  =  m-1  then  Q(M0)  *  0;  otherwise,  F  =  0  (recall  that  Q(M„)  is  the  set  of  matchable  edges 
relative  to  Mj. 

If  the  event  F  is  true  then  S(i+1)  =  min{j  >  S(i):  YjtfB).  Using  this  and  the  strong  Maritov 
property  of  (Yk:  k  2  0)  we  have 


E[S(i+l)  -  S(i)|S(i).(Yo. - YS(l>)]IF  =  E[min(j  >  S(i):YJ  4  B}  -  S(i)|YS(i)  *  MJIp 

-  E[S  |  Y o  =  MJIF, 


(2.3.1) 


where  S  denotes  the  stopping  time  S  =  min{j  2  1:  Yj  4  B). 

Let  B  be  the  set  of  matchings  B  =  (M:  |M |  2r  m-l).  Note  that  B  c  B  We  let  (Yk:  k  £  0) 
denote  a  stauonary-transiuon  Markov  chain  with  state  space  §  and  one-step  transition  probabilities 
determined  by  conditioning  CYk.  k  >  0)  to  stay  in  B  for  each  consecutive  jump: 

PfYk<.t  =  M'  |  Yk  =  M|  =  P[Yk*,  =  M'|Yk  =  M]/A(M) 


si njr 


TO 
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for  M.  M'  in  B.  where  A(M)  =  P{Yk+1  €  B|Yk  =  M]. 

Define  a  stopping  time  S*  by  S*  *  min{k  ^  1:  Yk  e  §  -  B ) .  Let  Sl.  denote  a  random  variable 

on  the  same  (or  possibly  enlarged)  probability  space  as  (Yq,  Y,,  .  .  .)  such  that 

PIS.  >  k|Y0.Y,....l  =  fl  ^CY,). 

i-o 

Let  S- min(S..  S.).  If  we  impose  the  conditions  Y0  =  M0  and  Y0  =  M0  then 

(S,  (Yk;  0  £  k  <  S))  and  (S.  (Yk:  0  S  k  <  S))  have  the  same  distribution.  Since  S  £  S..  it  follows 

that 

E[S  |  Y0  =  MJ  *  E[S.  I Y0  *  MJ.  (2.3.2) 

Lemma  2.3.2a  is  implied  by  (2.3.1),  (2.3.2)  and  Lemma  2.3.2b,  which  is  stated  and  proved  next 


Lemma  2  3  2b  Under  the  conditions  given  in  the  proof  of  Lemma  2.3.2a 


E(SjY0  =  MJ  S  2(0*. 


Proof  of  Lemma  2.3.2b:  Either  jM0|  =  m  or  |M0|  =  m-1.  We  will  prove  that  if  |MC|  =  m. 


E[S.|Y0  =  M01  <,  2(i)’  -  1. 


(2.3.3) 


This  will  imply  the  lemma  in  general.  Hence,  we  assume  that  |Mj  =  m  for  the  rest  of  the  proof 


AM 


of  Lemma  2.3.2b. 


m 


For  any  matching  M,  let  f(M)  denote  the  length  of  the  shortest  augmenting  path  for  M.  The 
function  f(M)  is  well  defined  if  M  is  not  a  maximum  matching  (in  particular  if  |M|  =  m),  and 
ft  Ml  e  (1.  3,  5.  .  .  .}.  Let  L  denote  the  maximum  of  f(M)  over  all  M  with  |M|  =  m. 


Claim2  3  J  (15)  L  S  2|3- 1 


Proof  Let  M  be  a  matching  with  |M|  =  m  and  let  M*  be  a  maximum  cardinality  matching  in 
G  Let  G'  denote  the  graph  with  a  set  of  nodes  V  and  a  set  of  edges  M®M*.  where  M®M* 
denotes  the  symmetric  difference  of  M  and  M*  Each  node  m  G'  has  incident  to  it  at  most  one 
edge  from  M  and  at  most  one  edge  from  M*.  Thus,  all  maximal  connected  components  of  G'  are 
paths  or  cycles,  and  all  cycles  have  even  length  The  cycles  and  even  length  paths  each  have  an 
equal  number  of  edges  from  M  and  M*,  while  each  odd  length  path  has  exactly  one  more  edge  of 
M*  than  M  and  is  an  augmenting  path  for  M  Thus,  there  are  at  least  m*  -  m  node-disjoint  aug¬ 
menting  paths  for  M,  which,  altogether,  have  at  most  m  edges  of  M.  Thus,  one  of  the  augmenting 
paths  has  no  more  than  m/(m*-m)  edges  of  M  and  hence  has  length  ai  most  l+2m/(m*-m) 
Finally,  l+2myfm*-m)  S  2(5-1,  since  m  <  m*(l-l/j3),  and  the  proof  is  complete 

□ 


Claim  2  3  2.  Suppose  M  is  a  matching  with  |M|  =  m  and  define  po  and  p,  by 

1  d* 

Po  =  —  and  P!  =  min(  — ,1  -  Po) 

2m  m 

Then  for  all  k  >  0 

fa)  ^  f(M)  -  2|Ya  =  M]  £  Po  if  ffM)  2  3. 

fb)  P[  I  |  >  m+1  lYjk  =  M]  ^  po  if  f(M)  =  1, 

(C)  PlffY^j)  >  ffM)  +  21^  =  M]  =  0. 
fd)  P[f(YW  =  f(M)  +  2|Ya  =  M]  S  p,. 


Proof  We  will  first  prove  part  fa)  under  the  assumption  that  ffM)  £  5.  Choose  an  augment 
ing  path  p  for  M  of  length  ffM)  and  label  some  of  the  nodes  and  edges  of  it  as  indicated  in  Figure 
2.3.1  Since  p  is  an  augmenting  path  of  the  shortest  length,  no  neighbor  of  ult  except  possibly 


node  v,,  can  be  an  exposed  node.  Also,  if  u,  and  v,  are  neighbors,  then  w,  and  v2  are  noL  Thus, 
there  are  at  most  two  choices  for  an  edge  e',  namely  C!  and  possibly  either  [ut.v,]  or  [w,,vj],  such 
that  f(M  -  e,  +  e')  2  f(M).  There  is  also  at  least  one  choice  of  e',  namely  e'  =  elt  such  that 
ft M  -  C[  +  c')  =  f(M)  -  2.  Thus. 

PlfrtW  =  KM)  -  2  lY^*,  =  M  -  e,u  1/3. 

This  is  true  with  ei  replaced  by  e2  as  well,  so 

P!f(  Ya.:)  =  f(M)  -  2|Ya  =  M]  2  PIY^,  =  M  -  e,  or  Ya*,  =  M  -  e^ |Ya  =  M)/3 

=  —2—  *  —1—  =  Po 

3  j  M  |  2 1 M  | 

This  establishes  pan  (a)  if  f(Mj  >  5. 

We  will  now  complete  the  proof  of  part  (aj  by  considering  the  case  ff NT)  =  3.  Let 
,'v:,  w,.  w:  v;]  be  an  augmenung  path  for  M  of  length  3  and  let  e  =  [w,.  w2].  Then  e  is  in  M, 
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and  nodes  v,  and  v2  are  not  neighbors.  Now,  if  e'  is  an  edge  such  that 


f(M-e  +  e')i3 

then  e'  must  be  incident  to  either  v,  or  w,  and  to  either  v2  or  w2. 


(2.3.4) 


Moreover,  if  e'  =  (v,.  wj  is  such  an  edge,  then  v2  and  w,  must  not  be  neighbors.  Thus, 
there  are  at  most  two  choices  of  e'  such  that  (2.3.4)  is  true,  namely  e  and  possibly  one  of  [v,,  wj 
or  [v2,  w,j  There  are  also  at  least  two  values  of  e'  such  that  f(M  -  e  +  eO  *  1,  namely  [v(,  wt] 
and  [v2,  w2].  Thus. 

P[ffYa+2)  =  f(M)  -  2 |Ya  =  Ml  2  ^  pt*2k+i  =  M  -  e|Ya  =  M]  =  p„. 

4 

Pan  (a)  is  proved. 

Turning  to  pan  (b),  assume  that  f(M)  =  1.  Then  Q(M)  is  not  empty.  Hence, 

P!  | Ya+,  |  a  m+1  |Ya  *  M)  -  PtlYa*, |  *  m+1  |Ya  *  Ml  *  |Q(M) |/( |Q(M)|  +  Xm) 


2:  (1+mX)-1  2  (1+m)"  ^  Po* 


so  that  pan  (b)  is  proved. 


Pans  (c)  and  (d)  will  now  be  proved.  Choose  an  augmenting  path  p  for  M  of  length  f(M). 


T*  =  {(c[Xz):  e(  e  M.  e2  is  matchable  relative  to  M  -  e|,  and  f(M  -  ej  +  c^)  £  f(M)  +  2). 
Suppose  (C\,ez)  €  r„.  Then  e|  and  e2  are  incident  to  a  common  node  (otherwise  et  is  matchable 

relative  to  M  -  +  e2.  e2  is  matchable  relative  to  M,  and  hence  f(M  -  ei  +  e^)  =  f(M)  =  1,  a  con¬ 

tradiction)  and  e,  *  e2.  Since  p  is  not  an  augmenting  path  for  M  -  e(  +  e2,  at  least  one  of  e,  or  e2 
is  incident  to  a  node  of  p  This  means  that  either  e(  is  an  edge  of  p  or  e2  is  incident  to  one  of  the 
exposed  nodes  on  the  ends  of  p.  Thus,  we  have  narrowed  down  the  possibilities  to  one  of  the  four 
cases  shown  in  Figure  2.3.2.  We  can  rule  out  the  first  three  of  these  cases,  because  in  these  cases 
there  is  an  augmenung  path  for  M  -  ej  +  e2  with  length  at  most  the  length  of  p.  We  have  thus 


lw»l 


.  ,1  l 


.<4 i?V 


Figure  2.3 2.  Four  possibilities  for  (e^e^  are  pictured.  Edges  in  the  path  p  are  drawn  straight 
and  horizontally.  Edges  in  M  are  bold.  Nodes  Vj  and  v3  are  the  end  nodes  of  an  augmenting  path 
for  M  -  et  +  e2.  Only  the  fourth  possibility  can  really  occur. 


shown  that  if  (e^e^)  €  T+,  then  e2  is  incident  to  an  exposed  node  of  p,  ej  and  e2  are  incident  to  a 
common  node,  and  ej  is  not  in  the  path  p.  It  follows  that  f(M  -  ej  +  e^  =  f(M)  +  2,  for  any 
(e^e^)  in  r+,  which  proves  part  (c). 

Define 

W  =  fe2:  (ej.e^  e  T+  for  some  e^. 

If  e  <=  W  then  there  is  exactly  one  edge,  call  it  w(e),  such  that  (w(e),  e)  e  T+.  Each  edge  in  W  is 
incident  to  an  exposed  node  of  p  so  that  |W|  <,  2d*.  Thus, 

P[f(Ya.2)  =  ffMH-2 1 =  M]  =  £  P[Y2k+2  =  M  -  e,  +  e2,  Y^,  =M-e,|Y;jc  =  M] 


§ 

« 


s 

V 


8 


a 


8 


i 


i 


*v 

K* 


* 

r’ 


29 


I  P[Yzk+2  =  M  -  w(e)  +  e|Y2k+i  =  M  -  w^PtY^  =  M  -  w(e)  |  Y*  =  M] 

eeW 


*  I 


w  |Q(M-w(e))|  JM  | 


1  ^  *  d 


2  |M|  |M| 


Together  with  parts  (a),  (b),  and  (c),  this  proves  part  (d)  so  that  Gaim  2.3.2  is  completely  proved. 

□ 

We  will  now  complete  the  proof  of  Lemma  2.3.2b  using  Gaims  2.3.1  and  2.3.2.  Define  a 

process  (Uk:  k  £  0)  by  Uk  =  -j(l  +  fCY^))^  ssj-  Note  that  Uk  takes  values  in  {0,  1 .  L) 

where  L  =  (l+L)/2.  Gaim  2.3.1  implies  that  L  ^  p  and  Qaim  2.3.2  implies  that 


^  Po  if  i  =  j-1 
P[Uk+,  =i|Uk=j,Uk_, . U0]  “I  =0  ifi^j+2 


<,  p,  if  i  =  j+1 


(2.3.5) 


Let  (Wk:  k  >  0)  denote  the  Markov  chain  with  one-step  transition  probabilities  shown  in  Figure 
2.3.3.  From  Equation  (2.3.5)  it  follows  that  if  W0  =  Uo.  then  the  Markov  chain  (Wk:  k  >  0)  sto¬ 
chastically  dominates  the  process  (Uk:  k  >  0).  Hence. 

E[SJY0  =  MJ  +  1  =  2E[min{j:  Uj  =  0}  |Y0  =  MJ 


V 


.V 


A 

ft 


S  2E[min(j:  Wj  =  0}  |W0  =  f(MJ] 


<  2E[min{j:  W;  =  0}  |W0  =  L] 
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Figure  2JJ.  One-step  transition  probabilities  for  the  Markov  chain  (Wk:  k  £  0). 


in  „  f  '1&-1 

S  —  —  Si£il  <  2o)‘. 

Po  Po  Po  Po 


This  establishes  Inequality  (2.3.3),  so  the  proof  of  Lemma  2.3.2b,  and  hence  also  the  proof  of 


Lemma  2.3.2a,  is  complete. 


Proof  of  Lemma  2.3.3:  In  the  first  part  of  the  proof,  we  will  refer  to  the  setup  in  the  proof  of 
Lemma  2.3.2a.  By  the  reasoning  there,  we  see  that  for  i  £  1, 


P[  |YS(i+i)  |  >  |YS(i)||S(i),(Y0Y, . YS(i))]lF  =  P[|Ysl  =  m+l|Yo  =  Mo]IF 


=  P[S_  >  S,  |Y0  =  MJIp 


S--1 


Ein  A(Yj)|Y0  =  Mo]IF. 
r<> 


The  term  A(Yj)  is  equal  to 


P[Yk+,eB|Yk  =  Yj]  = 


lQ(Yj)l 


lQ(Yj)|  +  X(m-1)  if  I  Xj  |  =  m-1 
1  if  |  Yj  |  2  m. 


As  in  the  proof  of  Lemma  2.3.2a,  we  assume  that  if  |M0|  =  m-l,  then  QfMJ  0.  Hence,  if 
Yu  =  M0  and  |Y0|  =  m-l,  then  Q(Yo)  *  0.  Moreover,  if  | Yj |  =  m-l  for  some  j  S  1,  then  YH  is 
a  matching  containing  Yj  and  so  Q(Yj)  *  0-  Thus,  given  Y0  =  M0,  Q(Yj)  *  0  whenever  [Yj|  = 

m-l,  for  all  j  >  0.  Therefore,  given  Y0  =  M0  we  have  A(Yj)  £  (1-Km-1)A.)~*  Also, 

note  that  |Yj|  >  m  for  at  least  half  of  the  values  of  j  with  0  £  j  £  S+-l.  Thus. 

S  —1 

E[  fl  A(Yj)|Yo  =  MJ  S  E[(l  +  2^1)'SJ2  |y0  =  M0]  2:  (1  + 
ro  1  1 


>  exp(-A.|V[a>72)  =  exp(-l/6)  >  5/6, 


where  for  the  second  inequality  we  used  Lemma  2.3.2b  and  Jensen’s  inequality,  and  for  the  last 
two  inequalities  we  used  the  inequality  exp(u)  t  1+u.  Therefore,  for  i  >  1, 


P[  |Zi+ll  >  |Zil  |Zo>  •  •  •  -ZJI{R(Z)>  i)  -  (5/6)I(R(Z)>  j). 
Now  IZi+J  -  |Zj|  e  { -2, -1.1,2}  so  that 


E[  |Z,+I  |  -  IZiUZo . ZiH{R(Z)  ^  >  (-| 


-  2  •— )I(R(Z)>  i}  -  •jI(R(Z)>il’ 


which  implies  that  the  process 


is  a  submartingale  uniformly  bounded  from  above.  Thus,  by  a  version  of  Doob’s  optional  sam¬ 
pling  theorem  [16,  Theorem  7.4.6.ii] 

E  |2r(Z)I  -  -(Z)2~J-  SE  >0, 

which  yields 

ER(Z)  <  2E|ZR(Z)|+1  <  2m*+l  <  |V|+1  £  D3. 

Lemma  2.3.3,  and  hence  Theorem  2.3.1,  are  completely  proved. 

□ 

2.4.  Simulated  Annealing  when  Temperature  Schedules  are  Constant 

In  this  section,  we  consider  the  limitations  of  restricting  our  attention  to  degenerate  tempera¬ 
ture  schedules  that  have  only  a  single  value.  The  main  result  of  this  section  is  the  theorem  follow¬ 
ing  the  next  set  of  definitions.  Let  G'(n,d*)  be  the  set  of  all  graphs  with  n  nodes  and  maximum 
node  degree  d*,  M'(G)  the  set  of  all  matchings  of  graph  G,  and  m*(G)  the  size  of  the  largest 
matching  of  G.  Let  R(G)  be  the  set  of  all  symmetric  transition  probability  matrices  R  over  M'(G) 
with  the  following  property:  for  all  i,  j  €  M'(G),  there  is  a  positive  integer  k  and  a  sequence 

i  =  i(l),  i(2) . i(k)  =  j  of  matchings  from  M'(G)  such  that  Ri<j>)i(h+i)  >  0  for  all  h  such  that  i  < 

h  <  k-1.  Let  c  be  the  cost  function  on  M'(G)  such  that  c(M)  =  -  |M|.  Suppose  R  is  a  transition 
probability  matrix  over  M'(G).  Let  (X^G,R,T):  k  £  0)  be  the  annealing  process  on  system 
(M'(G),  c,  R)  with  temperature  schedule  (Tk  :  k  >  0)  such  that  Tk  =  T  for  all  k  >  0. 

Theorem  2.4.1 :  Suppose  P  and  d*  are  real  numbers  and  n  is  a  positive  integer  such  that  —  £ 
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max  inf  min  max  E[min(k  :  |Xk(G,R,T)|  t  (l-B_1)m*(G)}  |  Xn  =  M], 

G«G'(n,d  *)  Te  (0,-h-J  Re  R(G)  MeM'(G) 

Then 


Q(p.  n,  d*) 


-  1. 


Remarks:  (1)  From  Theorem  2.4.1,  if  p  2  —  then 

4 


a  16 

fl(p,  n,  d*)  S;  «(i,  n,  d*)  ;>  - 1. 

4  5 

(2)  Note  that  Theorem  2.3.1  implies  that 

fl(p,  n,  d*)  £  24p2n5(2d*)2H 

(3)  Theorem  2.4.1  and  part  (2)  of  these  remarks  imply  that  if  P  is  at  least  two  and  is  con¬ 
stant  on  n  then  £}(p,  n,  n)  is  upper  and  lower  bounded  by  polynomial  functions  of  n  such  that  the 
functions  have  exponents  that  are  linear  functions  of  p. 

We  will  prove  Theorem  2.4.1  after  presenting  two  lemmas  and  defining  a  graph  A(XH,  Xl), 
which  is  a  generalization  of  the  graph  Gn  of  Section  2.2.  Graph  A(XH,  Xl),  shown  in  Figure  2.4.1, 
consists  of  4XHXL  nodes  and  has  maximum  node  degree  2XH  +  1.  Just  as  in  graph  Gn,  the  edges 

of  A(Xh,  XJ  are  partitioned  into  the  subsets  Hj,  H2 . H*.l,  B2 . BXj.-i-  These  subsets 

are  indicated  in  Figure  2.4.1.  For  each  i,  such  that  1  <  i  <,  XL,  H,  consists  of  2XH  edges  that  are 
disjoint,  and,  for  each  j,  such  that  1  £  j  £  XL-1,  the  subgraph  induced  by  the  nodes  of  the  edges  of 

Bj  is  a  complete  bipartite  graph  consisting  of  4XH  nodes.  Also  note  that  H  =  ^jH,  is  a  unique 

i*l 

maximum  matching  of  A(XH,  X^.  If  XH  =  Xl  =  n  then  A(XH,  Xl)  is  isomorphic  to  G^i  of  Sec¬ 
tion  2.2. 


IA 


KWivtsW 


each  path  is  characterized  by  a  sequence  of  edges  elf  e2 . e^+i.  where 

et€Hj.  e2eBJt  e3eHJ+I,  ev€Bj+1,  ....  e^+icH^  for  some  j  and  k.  We  will  call  such  kinds  of 
paths  stretched  paths.  Since  =  M,  there  is  a  one-to-one  and  onto  correspondence 
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We  will  now  proceed  to  show  that 


|T^|  (2XH-n)Xl~ 

|T.|  n+1 


(2.4.1) 


which  is  sufficient  to  prove  the  lemma.  Observe  that  from  each  element  t  of  T„  we  can  generate 
elements  of  T,^  in  the  following  way.  We  find  a  stretched  path  p  that  starts  in  Hlt  ends  in  H^, 

and  is  node-disjoint  from  any  of  the  paths  in  t.  Since  n  <  2ah  -  1,  there  are  at  least  (2XH-n)Xl- 
such  paths  p.  Then  we  add  the  path  p  to  set  t  to  form  an  element  of  T-,.  In  this  way,  each  ele¬ 
ment  of  T„  can  generate  at  least  (2XH-n)^L  elements  of  T^.  But,  each  element  of  Tn+]  can  be 
generated  in  this  way  by  at  most  n+1  elements  of  Tn.  We  can  then  conclude  the  Inequality  (2.4.1). 


Lemma  2.4.2:  Let  S  be  a  finite  set  and  c  be  a  cost  function  on  S.  Let  R  be  a  symmetric  tran¬ 
sition  probability  matrix  over  S  such  that  for  all  i.  j  e  S,  there  is  positive  integer  k  and  a  sequence 
i  =  i(l),  i(2), . i(k)  =  j  of  states  in  S  such  that  Ri(h)i(h+n  >  0  for  all  h  such  that  0  £  h  <,  k-1. 


Let  y  be  some  constant. 


CY  =  [s  e  S:  c(s)  <  y  and  R„'  >  0  for  some  s'  such  that  c(s')  >  y} , 
Y0  =  min{c(s):  s  e  S  -  Cy). 


Ay=  (s  e  S-  Cr  c(s)  =  y0}. 

Let  (Xk:  k  >  0)  be  the  annealing  process  corresponding  to  (S,  c,  R)  with  a  temperature  schedule 
(Tk:  k  >  0)  such  that  Tk  =  T  for  all  k  k  0.  Then  for  T  >  0  there  is  a  state  se  S  such  that 

E[min{k>0:c(Xk)SY)|Xo  =  s]  £  lAj/l^l. 

Proof:  If  or  Ay  are  empty  then  the  lemma  is  trivial  to  prove.  Thus,  we  will  assume  that 
both  and  Ay  are  not  empty.  Since  the  purpose  of  the  proof  is  to  lower  bound 
E[min{k  >  0:  c(Xk)  5  y)  |X0  =  s],  without  loss  of  generality  we  will  assume  that  all  states  in 


have  cost  exactly  equal  to  y.  We  have  assumed  that  T  >  0  and  R  is  a  symmetric  transition  proba¬ 
bility  matrix  over  S  such  that  for  all  i,  j  e  S,  there  is  an  integer  k  and  a  sequence 

i  =  i(l),  i(2) . i(k)  =  j  of  states  in  S  such  that  R^on-iy^  for  all  h  such  that  0  £  h  £  k-1, 

and.  therefore,  process  (Xk:  k  >  0)  has  the  limiting  distribution  (re,:  s  €  S),  called  the  Gibbs  distri¬ 
bution,  where  nt  -  exp(-c(s)/T)/  £  exp(-c(a)/T).  Then 

(5€  S 

£  -=r — E[min(k  2:  1:  Xk  e  |  X0  =  s]  =  (£  xf'. 

2-  x  ;T 

Thus,  there  is  a  state  S  €  such  that 

(  £  re,)'1  <1  E[min{k  2  1:  Xk  e  Cy)  l*o  *  8], 


E[min{k>l:  Xk  €  C7)  |X0  =  SI  =  1  +  £  R„expHc(s)-c(S)l/DE[min{k  2  1:Xke  |Xo  =  s] 

*«s-^ 

and  £  RS|  <  1 ,  there  must  exist  a  state  se  S-^  such  that 


exp(-[c(s}-c(S)]/T)E(min{k  2  l:  Xk  e  |Xo  =  s]  + 
The  previous  inequality  implies 


*  (  I  *<,)''■ 

oe^ 


exp(-[c(s)-c(§)]/T)E[min{k  2  1:  Xk  €  £?}  (  X0  =  s]  £  (  £  7i0)_I 

o«CT 


which  in  turn  implies 


£  cxp(-c(o)/T) 
<*s-^ _ 

l^lexpf-y/D 


E[min{k  2  1;  Xk  £  £y)  |X0  =  s)  2 


£  exp(-[c(o)  -  y]/T) 
(H  S-sT 

ICTlexp(— [c(s>  -  c(§)]/T) 


|;y|exp(-(c(s)  -  c(S))/T) 


2  exp((lc(s)  -  c(S)J  -  [y0  -  YlVT)|AT|/|CTj 


2  Ia^I/ICtI. 


and  we  are  done. 


Proof  of  Theorem  2.4.1 :  Let  XH  =  J  and  XL  =  Then  A(Xh,Xl)  is  a  graph  with  at 

P  4 

most  n  nodes  and  maximum  node  degree  at  most  d*.  For  positive  integer  k,  let  Sk  be  the  set  of  all 

|Sk+,|  (2XH-k)Xt' 

matchings  of  A(Xh,Xl)  of  size  |H|  -  k.  From  Lemma  2.4.1,  ■  £  — — -  for  all  k  S 

|Sk|  k+1 


L^J.  Thus,  for  all  k  £  we  have 


\SM  I 


£jl|j 


LfJ*1 


lfj+.  LfJ-1 


Let  Q  =  — - .  Then  |Sk|  i£  (1/Q)  p  |S  n  |  for  all  k  S  L4J-  Therefore, 

Lfj-M  1tj*‘  " 


SlSk|  £  |S  |  £ (1/Q)k+I  <  IS  H(l/Q)k+1. 
k=0  t|-J+‘  k»o  LjJ+1  k-0 


(2.4.2) 


Since  we  have  assumed  that  8  <  p  <  — ,  Q  is  greater  than  one,  and,  thus,  the  right  side  of  (2.4.2) 

4 

is  finite  and  is  equal  to  |S^„  j  ^  |(Q— 1)_1.  Hence,  it  follows  that 


I* 


J-O 

Let  G  =  A(XH,  XJ.  Lemma  2.4.2  and  (2.4.3)  imply  that,  for  any  T  >  0  and  any  transition  proba¬ 
bility  matrix  R  €  R(G), 


ill 

L-J  4 

max  E[min{k  :  |Xk(G-R/n|  ^  |H|  -  L4j)  I  =  M]  S  - 

Me  M'(G)  P  I— 1+1 

V 


-  1.  (2.4.4) 


Since  (1  -  P_1)m*(A(XH,  XJ)  =  (1  -  (3_1)|H|  S  |H|  -  [-—J,  Inequality  (2.4.4)  implies  that,  for  any 


T  >  0  and  any  transition  probability  matrix  R  e  R(G), 


LtJ^ 


max  E[min{k :  |X^^|  2:  (l-p_1)m*(G)}  |  «  M]  *  - 1, 

Me  M'(G)  |JLj+1 


(3- 


which  implies  the  theorem. 


2.5.  Speculations 

We  believe  that  Theorem  2.2.1  is  true  for  constants  0!  and  a2  much  larger  than  what  we  pro¬ 
vided  in  the  proof  and  that  ER  is  significantly  smaller  than  the  upper  bound  given  in  Theorem 
2.3.1.  Moreover,  we  conjecture  that  for  0  <  r  <  1,  the  average  time  needed  for  the  controlled 
processes  described  in  Section  2.2  to  reach  a  matching  having  cardinality  at  least  the  maximum 
possible  minus  |V|r  is  not  upper  bounded  by  a  polynomial  in  | V  j  for  some  sequence  of  graphs. 
We  believe  that  graph  A(XH  Xj  of  section  2.4  and  the  techniques  of  Section  2.2  could  be  used  to 


prove  this  conjecture. 


The  upper  bound  on  ER  given  in  Theorem  2.3.1  is  valid  for  all  graphs  with  a  specified 
number  of  nodes  and  maximum  node  degree  value.  In  the  next  chapter,  we  try  to  bound  £R  when 


I 
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we  restrict  our  attention  to  graphs  G  that  are  "typical"  in  some  sense. 

Our  methods  of  analyzing  simulated  annealing,  like  the  deterministic  methods  known  for 
solving  the  maximum  matching  problem,  do  not  easily  carry  over  to  "industrial  strength"  variations 
of  the  problem  or  to  other  problems.  More  work  will  be  needed  to  evaluate  the  average  time  com¬ 
plexity  of  simulated  annealing  and  other  search  heuristics  for  a  wide  range  of  problems. 


CHAPTER  3 


MATCHING  PROBLEM:  AVERAGE  PERFORMANCE  FOR  TYPICAL  GRAPHS 

3.1.  Introduction 

In  Theorem  2.3.1,  we  presented  an  upper  bound  on  the  average  amount  of  time  the  basic 
simulated  annealing  algorithm  of  Subsection  2.2.1  takes  to  find  a  matching  with  size  at  least  a  frac¬ 
tion  1  -  (T1  of  the  maximum  matching.  If  (3  =  ( V  |/2  then  the  upper  bound  is 

6|V|7(2d*)|v|-:.  (3.1  1) 

Expression  (3.1.1)  is  an  upper  bound  on  the  average  amount  of  time  the  algorithm  takes  to  find  a 

maximum  matching  of  a  graph  (V,E)  with  maximum  node  degree  d*.  Therefore,  if  we  exclude 

graphs  with  no  edges  then  (3.1.1)  is  exponential  in  |V|.  However.  (3.1.1)  is  a  bound  applying  to 

all  graphs  of  a  specified  number  of  nodes  and  value  of  maximum  node  degree. 

Our  objective  of  this  chapter  is  to  find  an  upper  bound  on  the  average  time  complexity  of 
simulated  annealing  for  the  matching  problem  for  a  "typical"  graph.  We  will  make  this  objective 
precise  after  the  following  definitions.  Let  m  be  an  increasing  positive  integer  valued  function  of 
n.  Typically,  m(n)  =  [cn6J  where  8  and  c  are  constants.  In  this  chapter,  we  will  use  m  as  a  func¬ 
tion  only  of  n,  so  we  write  m  for  m(n).  Let  G(njn)  be  the  set  of  all  graphs  with  node  set  {1.2,.  . 

.  ,  n)  and  m  edges.  Suppose  An  is  the  subset  of  all  graphs  in  G(n,m)  with  some  property  Q,  and 
|An|/|G(njn)|— »1  as  n— ►».  Then  we  say  that  almost  every  graph  has  property  Q.  Our  aim  is  to 
find  a  small  function  g  such  that  for  almost  every  graph  the  average  time  it  takes  the  simulated 
annealing  algorithm  of  Subsection  2.1.2  to  find  a  maximum  matching,  if  the  graph  has  n  nodes,  is 
at  most  g(n).  In  this  sense,  g(n)  upper  bounds  the  average  time  complexity  of  simulated  annealing 
for  the  matching  problem  for  a  "typical"  graph  with  n  nodes  and  m  edges. 


Since  we  do  not  know  of  an  exact  analytical  method  to  find  a  small  function  g,  we  use 
heunsuc  approximations  of  the  annealing  process  to  estimate  one.  The  approximations  of  the 
annealing  process  and  an  estimate  for  a  small  g  are  presented  in  Section  3.2.  The  estimate  is  com¬ 
pared  with  computer  simulations  in  Section  3.3.  and  conclusions  are  given  in  Section  3.4. 

3.2.  Approximations  and  Estimates 

We  are  interested  in  approximating  two  processes.  The  first  is  the  annealing  process 
<Xk.  k  2  0)  of  the  basic  simulated  annealing  algorithm  of  Subsection  2.1.2  for  a  "typical"  graph  in 
Gmjn)  and  T,  =  T  for  all  i  £  0.  Note  that  since  the  sequence  of  temperature  values  is  not  decreas¬ 
ing,  (Xk:  k  2  0)  is  not.  stnctly  speaking,  an  annealing  process.  We  will  approximate  (Xk:  k  £  0) 
by  a  process  fYk  k  £  0)  that  also  has  a  parameter  T.  We  will  then  use  (Yk:  k  £  0)  to  approximate 

E(mtn(k:  Xk  is  maximum)  | Xo  *  0).  (3.2.1) 

when  the  graph  is  a  typical  element  of  G(n.m),  and  to  find  a  value  of  T  so  that  (3.2.1)  is  small. 

The  second  process  we  are  interested  in  approximating  is  ()Ck  :  k  £  0).  which  is  the  limit  in 
distribution  of  the  process  (XJ(k):  k  2:  0)  as  T— »0,  where  J(0)  *  0  and  J(k+1)  =  min{j  S  J(k): 
Xj  *  XJrk))  for  all  k  2  0.  The  following  is  a  procedure  for  simulating  (Xk:  k  2:  0).  Let  Xq  equal  a 

matching.  Having  selected  Xq,  &i . choose  Xk+l  as  follows.  If  Xk  is  maximal  then 

choose  an  edge  e  at  random  from  Xk,  all  such  edges  being  equally  likely,  and  let  Xk+|  =  Xk  -  e 
If  Xk  is  not  maximal  then  choose  an  edge  e  at  random  from  the  set  of  edges  matchable  relative  to 
Xk.  all  such  edges  being  equally  likely,  and  let  Xk+i  =  Xk  +  e.  Note  that  (Xk:  k  2:  0)  is  not  depen¬ 
dent  on  a  temperature  parameter,  and,  by  the  theorem  of  Berge  and  Norman  and  Rabin  [10, 
Theorem  10.1],  it  will  eventually  visit  a  maximum  matching. 

We  will  approximate  (Xk:  k  2  0)  by  (Yk:  k  2  0),  which  we  define  as  the  limit  in  distribution 
as  T-+0  of  (Xj-fl^:  k  2  0),  where  J'(0)  =  0  and  J'(k+1)  =  min[j  2  J'(k):  Y.  ^Yj-^)}  for  all  k  £  0. 


Then  we  will  use  (Yk:  k  i  0)  to  approximate 

E(min{k  2  0:  Xk  is  maximum)  |X0  =  0],  (3.2.2) 

when  the  graph  is  a  "typical"  dement  of  G(n,m).  We  are  interested  in  estimating  (3.2.2)  for  two 

reasons.  First,  in  Section  3.3,  how  close  (Yk:  k  £  0)  approximates  (Xk:  k  2  0)  is  determined  by 

comparing  the  estimate  (3.2.2)  with  data  from  computer  simulations.  This  will  be  an  additional 

check  on  how  accurately  (Yk:  k  £  0)  approximates  (Xk:  k  £  0).  Second,  simulating  (Xk:  k  £  0)  is 

an  alternative  to  the  simulated  annealing  algorithm  for  the  matching  problem,  which  at  times  may 

be  preferable.  Hence,  we  are  also  interested  in  its  time  complexity. 

This  section  is  organized  as  follows.  Process  (Yk:  k  £  0)  is  presented  next.  Using 
(Yk:  k  S  0),  we  give  an  estimate  of  a  value  of  T  that  should  make  (3.2.1)  small.  Finally,  we 
present  an  estimate  for  (3.2.1)  (resp.,  (3.2.2))  using  (Yk:  k  £  0)  (resp..  (Yk:  k  £  0)). 


The  state  space  of  (Yk:  k  2  0)  is  {s(i,0),  s(i,I),  s(i,2),  s(i,3)}  and  each  state  corresponds  to 

i-0 

a  type  of  matching  rather  than  a  particular  matching.  State  s(i,j)  corresponds  to  matchings  of  size  i 
that  have  their  matchable  edges  configured  in  the  following  way:  for  j  equal  to,  respectively,  0,  1, 
2,  or  3,  the  set  of  matchable  edges  is,  respectively,  empty,  a  single  edge,  a  path  of  length  three,  or 
two  disjoint  edges.  The  transition  probability  P[Yk+1  =  s(ai,bi)|Yk  =  s(ao.bo)]  approximates  the 
"typical"  one-step  transition  probability  of  (Xk:  k  £  0)  from  an  s(a<),bo)-type  matching  to  an 
sfa1.bi)-tvpe  matching.  Note  that  an  s(ij)-type  matching  does  not  exist  for  all  i  and  j  such  that 

0  £  i  £  [—  J  and  0  £  j  £  3.  For  example,  all  edges  are  matchable  relative  to  a  matching  of  size 

zero,  but,  there  are  at  most  three  edges  that  are  matchable  relative  to  an  s(0,0)-,  s(0, 1)-,  s(0,2)-,  or 
s(0.3)-tvpe  matching.  These  inconsistencies  are  due  to  the  fact  that  process  (Yk:  k  >  0)  is  based  on 
some  assumptions  on  G(n,m),  T,  and  process  (Xk:  k  >  0).  We  ignore  these  inconsistencies, 
because  process  (Yk:  k  >  0)  is  relatively  simple  and  it  seems  to  do  a  fair  job  of  approximating  the 
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behavior  of  (Xk:  k  £  0). 


We  will  now  discuss  three  assumptions  on  G(n,m),  T,  and  process  (Xk:  k  £  0)  that  process 


(Yk;  k  S  0)  is  based  oa  First,  we  assume  that  ■jexiKM/T)  is  a  small  fractioa  Under  this  assump¬ 


tion  if  Xq  is  a  matching  that  is  not  maximal  then  with  high  probability  the  next  matching 
(Xk:  k  >  0)  visits  will  be  a  larger  matching.  We  make  this  assumption  because  we  are  only 
interested  in  values  of  T  that  make  (3.2.1)  small,  and  if  T  is  large  enough  so  that  the  long-term 
drift  of  (Xk:  k  >  0)  is  not  towards  larger  matchings  then  (3.2.1)  will  be  large. 


Second,  we  assume  that  the  average  degree  of  the  graph  (=  -^-)  is  small  relative  to  n.  For 

n 


the  rest  of  this  chapter  we  will  use  d  for  the  average  degree.  We  are  most  interested  in  the  case 


d  . 


when  —  is  small,  because  for  a  fixed  value  of  n,  the  smaller  d  is  the  more  time  it  takes  (Xk:  k  S  0) 
n 


to  reach  a  maximum  matching. 


Third,  suppose  that  (Xk:  k  >  0)  typically  occupies  matchings  of  size  i.  Then  we  assume  that 
n  -  2i  is  small.  The  value  n  -  2i  is  an  estimate  of  the  number  of  exposed  nodes  relative  to  a 


1 


matching  of  size  i.  The  estimate  will  be  accurate  if  m  >  cnlogn  and  c  >  — ,  for  then  maximum 


matchings  typically  have  cardinality  near  (see  [17]).  Our  assumption  that  n  -  2i  is  small  should 


be  consistent  with  the  assumption  that  exp(-l/T)-|J-  and  —  are  small,  because  then  we  would  expect 

2  n 


that  for  a  large  fraction  of  the  time,  before  reaching  a  maximum  matching,  (Xk:  k  £  0)  will  be  in 
matchings  that  have  size  that  are  near  maximum. 


Before  describing  (Yk:  k  £  0)  in  more  detail  we  will  simplify  the  notation  by  letting 
p(aI,b1;a0,b0)  =  P[Yk+I  =  s(ao,bo)|Yk  =  sfa^bj)]  and  q  =  n  -  2i.  A  schematic  description  of 
(Yk:  k  >  0)  is  given  in  Figure  3.2.1.  Each  box  in  the  figure  corresponds  to  the  state  labeled  on  its 
left  or  upper  left  comer.  Inside  each  box  is  the  configuration  of  matchable  edges  for  that  state, 


o  o  o  o 


s.'  L"/2j  .0)  ^/s(ln/2j.1) 

^s(  Ln/2  j  .2) 

0 

0—0 

s(  ln/2 J  ,3) _ 

O - O 

o — o 


Figure  3.2.1.  Stale  diagram  of  the  (Yk:  k  2  0). 
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where  0  means  no  matchable  edges.  An  arrow  from  a  box  corresponding  to  state  s(ao,b0)  to  a  box 
corresponding  to  sfa^b,)  means  the  transition  probability  from  s(ao,b0)  to  s(a,,b])  is  positive.  We 
do  not  indicate  by  arrows  the  positive  transition  probability  of  a  state  to  itself.  The  transition  pro¬ 
bability  values  are  presented  in  Figure  3.2.2.  We  will  ignore  states  s(0,0),  s(|yj,2),  and  s(LyJ,2), 
because  there  are  no  transitions  out  of  these  states  with  strictly  positive  probability. 

Next  we  will  give  heuristic  reasons  for  our  choice  of  the  states  and  transition  probabilities  of 
(Yk:  k  >  0).  Note  that  all  transition  probabilities  of  (Yk:  k  £  0)  that  correspond  to  increasing  the 
size  of  the  matching  are  chosen  to  be  consistent  with  the  configuration  of  matchable  edges  of  the 
state  of  which  it  came  from.  For  example,  there  are  three  matchable  edges  relative  to  an  s(i,2)- 
type  matching.  Matching  one  of  these  edges  gives  you  an  s(i+l,0)-type  matching  and  matching 
either  one  of  the  other  two  edges  gives  you  an  s(i+l,l)-type  matching.  Hence,  p(i,2;i+l,0)  =  m_I 

and  p(i,2;i+l,l)  =  2m'1.  We  assume  exp(-l/T)y  is  small  and,  therefore,  if  (Xk:  k  >  0)  is  in  an 
s(i,2)-  or  s(i,3)-type  matching  then  with  high  probability  the  next  matching  it  visits  will  be  a  larger 
one.  Thus,  for  all  i  <  [yj,  we  ignore  the  possibility  of  a  transition  from  either  states  s(i,2)  or 
s(i,3)  that  corresponds  to  decreasing  the  matching  size. 

We  will  now  discuss  the  transitions  out  of  s(i,0),  which  is  the  state  corresponding  to  maximal 
matchings  of  size  i.  In  this  discussion,  we  will  suppose  M  is  a  "typical"  maximal  matching  of  size 
i.  We  assume  that  for  each  e  e  M  the  end  nodes  of  e  do  not  have  a  common  neighbor,  which  is 

consistent  with  the  assumption  that  —  is  small.  Since  M  is  maximal,  for  all  edges  e  e  M,  all 

n 

matchable  edges  relative  to  M-e  share  a  node  in  common  with  e.  Thus,  we  can  partition  M  into 
two  subsets  A  and  B,  where  subset  A  contains  all  edges  e  such  that  each  node  of  e  has  at  least  one 
neighbor  that  is  exposed  relative  to  M,  and  subset  B  contains  all  edges  e  such  that  at  most  one 


node  of  e  has  a  neighbor  that  is  exposed  relative  to  M. 


For  all  i  such  that  1  <  i  <>  I— I, 

l2 


kvS 


p(i,0;i-l,2)  =  min{l,((n-2i)(d-l)/n)2}i-^ 


p(i,0;i-l.l)  =  max  { 0,  l-((n-2i)(d- 1  )/n)2} 

m 


p(i,l;i-l,2)  =  min{2(d-l)4) 


exp(-l/T) 


p(i,l;i-l,3)  =  max{0,i-2(d-l)} 


exp(-l/T) 


For  all  i  such  that  0  <  i  <  I— I  -  1, 

2 


p(i,2;i+l,0)  =  — , 

m 


p(i2;i+l,l)  =  — , 
m 


p(i,3;i+l,l)  =  — . 

m 


All  other  transition  probabilities  are  zero,  with  the  exception  of  transition  probabilities  from  a  state 
to  itself. 


Figure  3.2.2.  Transition  probabilities  of  (Yk:  k  >  0). 
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Our  estimate  of  the  size  of  A  and  our  characterization  of  the  types  of  "typical"  edges  in  sets 
A  and  B  are  based  on  the  following  argument.  Let  e  =  [i,  j]  be  a  "typical"  edge  of  M.  We 
assume  that  the  degree  of  each  node  of  e  is  d.  We  also  assume  that  for  each  neighboring  node  v 

of  i  or  j  that  is  neither  i  nor  j,  the  probability  of  v  being  exposed  relative  to  M  is  -3-.  Combining 

*  n 

these  assumptions  on  e  and  the  assumption  that  q  and  ^  are  small  we  have  the  following  approxi¬ 
mations: 

P[The  nodes  of  e  have  no  exposed  neighbors] 

=  (1  -  =  1  -  2i-d  ~  q,  (3.2.3) 

n  n 

P[Each  node  of  e  has  exactly  one  exposed  neighbor] 

=  ((d  -  l)-3-(l  -  -9-)*1-2)2  =  ((d  -  D-9-)2,  (3.2.4) 

n  n  n 

and 


r; 


P[Each  node  of  e  has  at  least  one  exposed  neighbor] 

=  (1  -  (1  -  •a)d-1)2  =  ((d-l)-3.)2  (3.2.5) 

n  n 

Approximations  (3.2.3)  and  (3.2.5)  and  the  assumption  that  q  and  —  are  small  imply  that 

n 

most  of  the  edges  e  of  B  are  such  that  e  is  the  only  matchable  edge  relative  to  M  -  e.  Hence,  if  e 
€  B  then  M  -  e  is  likely  to  be  of  type  s(i-l.l).  Approximations  (3.2.4)  and  (3.2.5)  imply  that 
most  of  the  edges  e  of  A  are  such  that  the  set  of  matchable  edges  relative  to  M-e  is  a  path  of 
length  three.  Therefore,  if  e  e  A  then  M  -  e  is  likely  to  be  of  type  s(i-l,2).  Approximation 
(3.2.5)  implies  that  |A|  is  approximately  (q(d-l)/n)2|M|.  Since  B  =  M  -  A,  we  approximate  |B| 
by  [  1— (q(d— 1  )/n)2]  ]M  ].  Based  on  these  approximations,  we  let  p(i,0;i-l,2)  = 

m in { 1  ,(q(d- 1  )/n)2 } i -e^~ 1  ^  and  p(i,0;i-l,l)  =  max{0,l-(q(d-l)/n)2}i exP<~1/T^.. 

m  m 


r 


We  will  now  discuss  the  transitions  out  of  s(i,l),  which  is  the  state  that  corresponds  to 
matchings  of  size  i  that  have  exactly  one  matchable  edge  relative  to  it.  For  e  =  [u,  v]  4  M,  He  is 
the  set  of  edges  [u\  v']  in  M  such  that  u'  or  v'  is  adjacent  to  one  of  u  or  v.  Let  Ue  be  the  set  of 
all  nodes  such  that  there  is  a  path  of  length  at  most  four  from  v  to  a  node  of  e.  The  following 

assumption  is  consistent  with  the  assumption  that  —  and  q  are  small.  We  assume  that  the  sub- 

n 

graph  induced  by  Ue  is  a  tree,  and  all  nodes  in  Ue,  except  the  nodes  of  e,  are  matched.  Under 
these  assumptions,  an  edge  e'  e  M  is  such  that  M-e'  is  an  s(i-l,2)-type  matching  if  and  only  if  e  e 
He  (see  Figure  3.2.3  for  an  example  of  He  under  these  assumptions).  We  expect  that  if  e'  e  He 
then  the  set  edges  matchable  relative  to  M  -  e'  would  be  a  path  of  length  three.  Since  we  assume 

q  and  —  are  small,  we  also  expect  that  most  edges  e'  e  M  -  He  are  such  that  the  set  of  edges 
n 


;«| 

t 


matchable  relative  to  M  -  e'  are  two  disjoint  edges.  Therefore,  we  assume  that  if  e'  =  M  -  H* 
then  M  -  e'  is  an  s(i-l,3)-type  matching.  Since  |He|  is  the  sum  of  the  degrees  of  the  nodes  of  e 
minus  two,  we  approximate  it  with  2(d  -  1).  Based  on  these  approximations,  we  let  p(i,l;i-l,2)  = 


min  2(d-I),i 


exp(-l/T)  and  p(i,l;i-l,3)  = 


axiu,i-2(d-i 


-exp(-l/T). 


We  now  turn  to  determining  a  value  of  T  that  will  make  (3.2.1)  small.  Let 


D(j)  =  min(k:  Yk  =  s(j+l,0)  or  s(j— 1,0)} , 


Ajj+1  =  P[Ydq)  =  s(j+l  ,0)  |  Y0  =  s(j.O)]. 


Ajj-i  =  P[YD(j)  =  sO-1.0)  |  Y0  =  s(j.O)]  =  1-AJJ+1. 


We  will  approximate  (3.2.1)  by 


E[min{k  >  0:  Yk  =  s(L^J,0)}  |  Y0  =  s(0,l)]. 


which  is  equal  to 


I  H[D(j)[Y0  =  s(j,0)] 

j=2 


T -  Z  n  (AkJt-l/Ak.k+l) 

i-j  krj+1 


+  E[D(1)  |Y0  =  s(l,0)]  2  fl  (Ak>-i^ Atk+i)  + 
rl  k«2 


where  n  (o-i. 

k*j+l 


(3.2.6) 


(3.2.7) 


To  prevent  (3.2.7)  from  being  exponential  in  n,  we  want  T  to  be  such  that  for 


j  <  [— J-l.  Then  (3.2.7)  is  at  most 


3  £  E[D(j)|Yo=s(j,0)]  +  m. 

J-2 

It  can  be  shown  that  if  2  <  i  S  [— J  -  1  then  A14_]/A,  1+1  equals 


(3.2.8) 
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n  \2  * i  ,^1  3+min  i,2(d-l)}exp(-l/D 

max{0,( - ) -l  min{i-l,2(d-l)  exp(-l/T)— - — — r—  t  —  r  .  ■ 

(d-l)q  2  3+min{i-l,2(d-l)}exp(-l/T) 


S( 


Thus,  if  T  satisfies  — exp(-l/T)-|^  1  then  A^/A^i  £  for  2  S  i  S  L^J  -  1-  Although  we 

want  T  to  satisfy  — - — exp(-l/T)— <  1  we  do  not  want  T  to  be  too  small,  otherwise 
3  (d-1)  3 

E[D(i)|Y0  =  s(i,0)]  will  become  very  large.  Hence,  a  value  of  T  that  makes  (3.2.7)  small  is  one 
where 


exp(-l/T)  = 

TT 

and  c  is  a  constant  <  1.5.  If  (Yk:  k  >  0)  is  an  accurate  approximation  of  (Xk:  k  >  0)  then  such  a 
value  of  T  will  also  make  (3.2.1)  small. 

We  now  turn  to  evaluating  E[D(i)|Y0  =  s(i,0)].  The  exact  expression  for  E[D(i)|Y0  =  s(i,0)] 
is  complicated,  so  we  will  approximate  it  with 


_ 1.5m _ 

exp(-irT)min{l,(— (n-2i))2)i  (3'2-9) 

n 

This  is  an  accurate  approximation  if  T  is  very  small.  Using  Approximation  (3.2.9),  we  get  the  fol¬ 
lowing  approximation  for  (3.2.8): 


4.5m 


LyJ-i 


exp(l/T)  £ 


*“2  min{l,(— — (n— 2i))2}i 


+  m 


<  (4.5m>exp<T/T) 


I  T  +  I 


-2  i(— (n-2i))2 


+  m 


V.'  < 


rz 


=  (4.5m)exp(l/T)  log(n)  + 


<  (4.5m)cxp(l/T)  log(n)  + 


=  (4.5m)exp(l/T)  log(n)  + 


£  (4.5m)exp( l/T)(log(n)  +  ^ 

2  (d-1)2 

=  |exp(LT)(2dnlog(n)  +  5y). 


n2 

<5*  , 

Y  1  i 

lfj-1 

r 

1 

(d-1)2 

m  i(n-2i)2 

• 

n  i(n— 2i)2 

J+ 

n2 

®  , 

Y  1  1 

1 

L-fj-i 

y 

(d-1)2 

h  2(n— 4))2 

k 

n2 

n/6  62  1 

• 

(d-1)2 

2n2  n^4i2 

k  4 

4 

(3.2.10) 


Finally,  if  exp(-l/T)  <  1.5  ^  P  then  Expression  (3.2.10)  is  our  estimate  of  (3.2.1).  Note  that 

rr 

E[D(L-jJ-l)|Y0  =  s(LyJ-D.O))  S  A.exp(l/D-^?  =  -Lexp(1A>0_  Thus,  if  (Yk:  k  £  0)  is 

a  good  approximation  for  (Xk:  k  >  0)  and  d  =  o((n/logn)0'5)  then  (Xk:  k  >  0)  will  be  such  that  for  a 
large  portion  of  its  time  before  it  visits  a  maximum  matching  it  will  be  in  near  maximum  match¬ 
ings.  This  is  consistent  with  one  of  our  assumptions  used  to  define  (Yk:  k  >  0). 

If  c  is  a  constant  such  that  c  S  1.5  and  exp(-l/T)  =  c  .  then  (3.2.10)  will  be  approxi¬ 


mately 


Therefore,  if  (Yk:  k  >  0)  is  a  good  approximation  of  (Xk:  k  >  0),  then  (3.2.11)  is  an  upper  bound 
on  (3.2.1)  for  almc  *  every  graph. 

We  now  turn  to  approximating  (3.2.2).  Recall  that  (Yk:  k  £  0)  is  the  limit  in  distribution  of 
(YJflt):  k  >  0)  as  T— >0,  where  J(0)  =  0  and.  for  k  >  0,  J(k+1)  =  minjj  >  J(k):  Y;  *  YJ(k)).  Let 

to  =  0  and  tfc  =  min{j  >  0:  Y,  =  s(k.0)}.  It  is  straightforward  to  show  that 


E[tk+i-tk|Y0  =  s(0,l)]  =  3max{(- 


(d-l)(n-2k) 


r.l}+l.  Then 


L-jJ-J 


E[min(j  >  0:Yj  =  ( L~J.O)}  |Y0  =  s(0,l)]  =  3  £  [max{(— — -)2,1}+1]  +  1 
2  j.i  (d-i)(n-2t) 


3  /  n* 


+  n). 


(3.2.12) 


2  (d-1)2 

Finally,  we  will  use  (3.2.12)  as  our  approximation  for  (3.2.2).  If  (Yk:  k  £  0)  is  a  good  approxima¬ 


tion  of  (Xk:  k  >  0)  then  (3.2.12)  is  an  upper  bound  on  (3.2.2)  for  almost  every  graph.  Note  that 


E[t  „  -  t  n  lYn  =  s(0,l)]  £  — ( — - — )2  +  1.  If  our  approximations  are  correct  then  (Xk:  k  £  0) 
1  if  J  L-jJ-i  0  4  (d-1) 


will  spend  a  large  portion  of  its  time  in  near  maximum  matchings  before  finding  a  maximum 
matching. 


3.3.  Experimental  Results 


In  this  section,  we  will  experimentally  evaluate  how  well  (Yk:  k  £  0)  approximates 
(Xk:  k  >  0)  and  how  well  (Yk:  k  >  0)  approximates  (Xk:  k  >  0).  First,  we  will  focus  on  evaluating 
the  accuracy  of  (Yk:  k  >  0).  We  check  the  accuracy  of  (Yk:  k  >  0)  by  using  it  to  predict  the  sam¬ 
ple  means  and  sample  standard  deviations  of  j  =  min(j:  Xj  is  maximum)  and  T  =  "tm.  -  tm._,,  where 
m*  is  the  size  of  the  largest  matching  and  tk  =  min(j:  |Xj|=k).  Recall  that  G(n,m)  is  the  set  of  all 
graphs  with  node  set  { 1,  2,  .  .  .  at)  and  m  edges.  If  (Yk:  k  >  0)  is  an  accurate  approximation  of 
(Xk:  k  £  0)  then  for  most  graphs  in  G(ngn)  the  mean  of  J  will  be  (3.2.12)  and  the  mean  of  T  will 


3  n  2 

be  approximately  E[t^n ^  -  t^  JY0  =  s(0,l)],  which  is  approximately  — ( ^—^y)  if  n  is  even. 


We  collected  data  from  simulations  of  (Xk:  k  £  0)  as  follows.  For  each  8  e  (0,  0.25.  0.5, 
0.75.  1.0),  wc  let  m  =  [con1*4].  where  c0  =  80(32)^,‘^).  and  we  considered  five  values  of  n:  32. 
128.  256,  and  512.  For  each  (5,n)  pair,  we  randomly  and  without  bias  selected  one  hundred 


graphs  from  G(n,m)  with  replacement.  For  each  graph,  we  simulated  (Xk:  k  >  0)  once.  From 
these  one  hundred  simulations  we  computed  the  sample  mean  m(J)  and  standard  deviation  o(J)  of 
1,  and  we  computed  the  sample  mean  m(t)  and  standard  deviation  a(t )  oft.  Note  that  m(j)  (resp., 
m(t))  is  an  estimate  of  the  mean  of  (3.2.2)  (resp.,  E[t|  Xq  =  0])  over  all  graphs  of  G(njn).  The 
sample  means  and  standard  deviations  are  given  in  the  tables  in  Figure  3.3.1.  These  tables  also 

*  3  n 

contain  the  values  of  (3.2.12),  which  are  listed  under  a(j),  and  the  values  of  —  (— — -)2,  which  are 

4  (d— 1) 

listed  under  aft). 

Note  that  the  m(t)  is  at  least  a  half  of  m(J),  which  is  consistent  with  our  prediction  that 
(Xk:  k  >  0)  will  spend  a  large  portion  of  its  time  in  near  maximum  matchings. 

We  estimate  the  rate  at  which  m(t)  and  m(J)  grow  with  n  using  the  following  procedure. 
First,  for  each  5,  we  find  an  argument  (a(t),P(t))  of 

min  £  (logt(nHog(e“np))2, 

n»32,64, 128 ,256,5 12 

where  t(n)  =  m(t)  for  the  graphs  with  n  nodes.  In  this  sense,  for  the  value  of  5,  eaft)n^(T)  is  the 
"best-fit"  curve  to  values  of  m(t)  as  a  function  of  n.  Similarly,  we  find  a  pair  (a(J),P(l))  for 
values  of  m(J).  The  values  of  (a0t),j3(^))  and  (a(J),p(j))  are  presented  in  Table  3.3.1.  If 
(Yk:  k  >  0)  is  an  accurate  estimate  of  (Xk:  k  >  0)  we  would  expect  P(t)  to  be  approximately  equal 
to  2(1-5)  and  P(J)  to  be  approximately  equal  to  max(2-25,l).  We  also  list  the  values  of 
max{2-25,l}  (resp.,  2(1-5))  under  P(J*)  (resp.,  P(x‘)). 


Table  3.3.1  shows  that  our  predictions,  (3.2.12)  and  — ( — - — 

4  (d-1) 


)2,  tend  to  be  more  accurate  for 


sparse  graphs.  This  is  not  surprising,  since  our  approximations  were  based  on  the  assumption  of  a 
sparse  graph.  In  addition,  our  predictions  seem  to  be  asymptotic  upper  bounds  of  the  sample 
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Figure  3.3.1.  Tables  of  sample  means  md)  and  mij);  sample  standard  deviation  oft)  and  o(j), 
and  our  estimate  a(t)  (resp..  a(J)i  of  t  frcsp..  J).  based  on  the  approximation  (Yk:  k  £  0)  of 
'\k  k  >  0>  The  graphs  considered  have  n  nodes  and  m  =  [con1*6]  edges. 


Table  3.3.1.  Function  eaft)np(')  of  n  is  the  best-fit  curve  of  m(T)  (resp.,  m (1))  as  a  function  of  n. 
The  value  of  (3(t*)  (resp.,  p(J*»  is  the  estimate  of  0Ot)  (resp.,  £(3))  based  on  the  approximation 
(Yk:  k  £  0)  of  (Xk:  k  £  0). 
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In  Figure  3.3.2,  we  have  three  graphs,  corresponding  to  5  =  0,  0.5,  1.0.  In  each  graph,  we 
plot  m(3),  e“(7)nw,),  and  (3.2.12)  versus  n  corresponding  to  the  particular  value  of  5  for  that  graph. 

In  Figure  3.3.3,  we  plot  three  graphs  of  mft),  ea(')nP(t),  and  — ( — 2 — )2  versus  n  for  5  equal  to  0. 

4  (d— 1) 

0.5,  10.  In  these  graphs,  the  best-fit  curves  approximate  the  sample  mean  data  fairly  accurately, 
except  for  the  curve  approximating  m(J)  when  6  =  0.  These  graphs  also  indicate  that  (3.2.12) 
3  n  7 

(resp.,  —  (  -  ■)  )  ‘S  an  asymptotic  upper  bound  on  m(J)  (resp.,  m(t)). 

Next  we  will  determine  how  accurately  (Yk:  kiO)  approximates  (Xk:  k  2  0)  when  exp(-l/T) 
is  approximately  —  —  - .  We  check  the  accuracy  of  (Yk:  k  2  0)  by  using  it  to  predict  the  sample 

means  and  the  sample  standard  deviations  of  1  =  min[k:  Xk  is  maximum).  If  (Yk:  k  St  0)  is  an 
accurate  approximation  of  (Xk:  k  >  0),  then  for  most  graphs  in  G(n.m)  the  mean  of  1  will  be 

4 

bounded  above  by  (3.2.10).  which  is  approximately  3n3logn  +  7.5-^-. 

d‘ 


58 


O 
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For  the  values  of  n  we  considered,  I  can  be  quite  large,  because  the  value  of  T  we  used  was 
so  small  that  for  a  large  portion  of  the  time  (Xk:  k  >  0)  would  be  sitting  in  maximal  matchings. 
To  be  more  computationally  efficient  in  estimating  the  mean  of  I  we  do  the  following.  First,  set  a 
register  i  to  zero.  Then  simulate  the  process  (Xj^:  k  £  0),  where  J(0)  =  0  and 

A  JU 

J(k+1)  =  min{j  £  J(k):Xj  *  XJ(k)}.  For  every  value  of  k  £  0,  we  increment  I  by  - r— — r, 

Tic  +  lxj(k)l 

where  yk  is  the  number  of  edges  that  are  matchable  relative  to  X1(ky  When  (Xj^y  k  k  0)  reaches  a 
maximum  matching  we  stop  the  simulation  and  i  is  our  estimate  of  1. 

We  computed  sample  means  of  I  in  the  same  way  we  computed  sample  means  of  )  and  t, 
i.e.,  for  each  value  of  n  and  8,  100  graphs  were  randomly  selected  from  G(nm),  and,  for  each 
graph,  a  simulation  of  (XJ(k):  k  >  0)  was  done  and  statistics  were  taken.  The  sample  mean  m(l) 
and  the  sample  standard  deviation  c(i)  of  i  are  given  in  the  tables  in  Figure  3.3.4.  Note  that  m(!) 
is  an  estimate  of  the  mean  of  (3.2.1)  over  all  graphs  of  G(n,m).  Also  listed  in  the  tables,  under 

4 

a(I),  is  the  value  of  3n3logn  +  7.5— r.  We  also  compute  the  pair  (a(i),P(I))  for  the  best-fit  curve 

<r 

ead)n0O)  0f  [he  vajues  0f  m(I)  as  a  function  of  n.  The  (a(f),(3(!))  values  are  given  in  Table  3.3.2. 
If  (Yk:  k  >  0)  is  an  accurate  approximation  of  (Xk:  k  >  0)  then  P(I)  should  be  approximately 
max{3,4-2S}.  Included  in  Table  3.3.2  are  the  values  of  max{3,4-25),  which  are  listed  under 
P(I*).  Just  as  in  the  analysis  of  the  rate  of  growth  of  J  and  T,  our  predictions  for  i  tend  to  be  more 
accurate  for  sparse  graphs. 

Note  that  P(I)  -  P(J)  is  approximately  2  for  small  values  of  5  (p(I)  -  p(J)  =  2.1,  1.9,  1.9, 
1.7,  and  1.4  for  8  equal  to  0,  0.25,  0.50,  0.75,  and  1.0,  respectively).  To  see  why  this  is  so, 
observe  that  our  choice  of  T  is  small  so  that  the  process  (X; ^y  k  >  0)  behaves  like  the  process 
<Xk:  k  >  0).  Therefore,  the  sample  mean  of  i  should  be  approximately  the  sample  mean  of  )  times 
the  average  value  of  J(k+1)  -  J(k)  over  k  such  that  0  S  k  <  k*  and  J(k‘)  =  i.  Since  T  is  small,  this 
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Figure  3.3.4.  Tables  of  sample  mean  m(I);  sample  standard  deviation  o(I);  and  our  estimate  a(I)  of 
1.  based  on  approximation  (Yk:  k  >  0)  of  (Xk:  k  >  0).  The  graphs  considered  have  n  nodes  and  m 

—  I  n  .  n  t  I  arlrrne 


Table  33.2.  Function  e“mn^  of  n  is  the  best-fit  curve  of  m(I)  as  a  function  of  n.  The  value  of 
(3(1*)  is  the  estimate  of  (3(1)  based  on  the  approximation  (Yk:  k  £  0)  of  (Xk:  k  2  0). 
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average  will  be  roughly  the  average  amount  of  time  (Xk:  k  £  0)  takes  to  leave  a  maximal  matching 

that  has  size  proportional  to  n.  This  average  time  is  approximately  - =  — -n2.  Then 

nexp(-l/T)  3 


(3(1)  -  (3(J)  should  equal  to  2. 


Ln  Figure  3.3.5,  we  plot  three  graphs  of  m(i),  ea(^np(J),  and  3n3logn  +  7.5-^r-  versus  n  for  5 

d2 

equal  to  0,  0.5,  and  1.0.  These  graphs  indicate  that  the  best-fit  curves  are  accurate  approximations 

i  n4 

of  the  sample  mean  versus  n.  From  these  graphs  it  also  seems  that  3n  logn  +  7.5—  is  an  asymp¬ 


totic  upper  bound  for  m(I). 

We  conclude  this  section  with  some  final  remarks.  Our  experiments  show  that  the  process 
(Yk:  k  >  0)  is  a  reasonable  approximation  of  (Xk:  k  >  0),  and  if  the  graphs  are  sparse  then 
(Yk:  k  >  0)  is  an  accurate  approximation  of  (Xk:  k  >  0).  This  is  not  surprising,  since  one  of  our 
underlying  assumptions  used  to  define  (Yk:  k  >  0)  is  that  the  graph  is  sparse.  Note  that 
(Yk;  k  >  0)  leads  to  estimates  that  seem  to  be  asymptotic  upper  bounds  on  the  average  amount  of 


time  (Xk:  k  >  0)  and  (Xk:  k  >  0)  take  to  find  a  maximum  matching.  If  the  estimates  are  actual 
asymptotic  upper  bounds  then  for  almost  every  graph  (3.2.11)  is  an  upper  bound  for  (3.2.1)  if  c  £ 
1.5.  Note  that  if  d  is  bounded  below  by  one  then  (3.2.11)  is  0(n4).  Also  note  that  the  sample 
mean  of ’t  is  at  least  half  the  sample  mean  of  J.  This  is  experimental  evidence  which  helps  to  jus¬ 
tify  another  of  our  assumptions  used  in  defining  (Yk:  k  S  0):  the  number  of  exposed  nodes  is 
small. 

3.4.  Conclusions 

In  the  previous  chapter,  we  presented  results  which  showed  that  solving  the  matching  prob¬ 
lem  by  the  basic  simulated  annealing  algorithm  of  Section  2.1.2  takes  average  time  that  is 
exponential  in  the  size  of  the  instance.  In  addition,  we  also  showed  that  to  find  a  near  maximum 
matching  only  takes  average  time  that  is  polynomial  in  the  size  of  the  instance.  In  this  chapter,  we 
found  an  estimate  (3.2.10)  of  a  small  upper  bound  on  the  average  amount  of  time  the  basic  simu¬ 
lated  annealing  algorithm  takes  to  find  a  maximum  matching  for  typical  graphs  with  n  nodes  and  m 
edges.  If  m  >  n  then  (3.2.10)  is  0(n4).  We  checked  this  estimate  with  data  from  simulations  and  it 
seems  that  this  estimate  is  an  asymptotic  upper  bound. 

An  important  reason  why  we  were  able  to  analyze  the  performance  of  simulated  annealing  for 
the  matching  problem  on  "typical"  graphs  was  that  we  were  able  to  approximate  the  annealing  pro¬ 
cess  (Xk:  k  >  0)  by  a  process  (Yk:  k  >  0)  that  was  quite  homogeneous.  In  Section  4.2,  we  present 
a  method  of  generating  homogeneous,  easy-to-analyze  annealing  processes. 


CHAPTER  4 


THE  TEMPLATE  METHOD,  THE  THRESHOLD  RANDOM  SEARCH  ALGORITHM, 
AND  THE  NONMONOTONICITY  OF  OPTIMAL  TEMPERATURE  SCHEDULES 

4.1.  Introduction 

In  this  chapter,  a  collection  of  miscellaneous  results  is  presented.  In  Section  4.2,  we  give  a 
simple  technique,  called  the  template  method,  that  produces  easy-to-analyze  annealing  processes. 
A  random  search  algorithm,  which  we  refer  to  as  the  threshold  random  search  algorithm  is 
presented  in  Section  4.3.  The  algorithm  is  a  generalization  of  simulated  annealing.  In  Section  4.4, 
sets  of  conditions  are  given  under  which  no  monotone  decreasing  temperature  schedule  is  optimal. 

4.2.  The  Template  Method 

In  Chapters  2  and  3,  bounds  and  estimates  on  the  average  amount  of  lime  simulated  anneal¬ 
ing  takes  to  solve  the  matching  problem  were  derived.  However,  we  do  not  know  of  any  other 
nontrivial  combinatorial  optimization  problems  amenable  to  such  analysis.  One  of  the  reasons  why 
simulated  annealing  applied  to  solving  the  matching  problem  could  be  analyzed  is  that  there  is  a 
great  deal  of  homogeneity  in  the  annealing  process. 

In  this  section,  a  simple  method  we  call  the  template  method  will  be  given  that  produces 
annealing  processes  that  have  a  great  deal  of  homogeneity  and,  as  a  result,  are  easy  to  analyze. 
The  annealing  processes  produced  by  the  template  method  are  homogeneous  so  that  the  states  can 
be  classified  into  a  relatively  small  number  of  types  of  states.  In  addition,  for  any  state  s  and  type 
a,  the  transition  probability  from  s  to  some  state  of  type  a  that  has  cost  5  larger  than  the  cost  of  s 
is  dependent  only  on  a,  6,  the  type  of  s,  and  the  temperature  schedule.  Our  hope  is  that  the  use  of 
this  method  will  produce  interesting  annealing  processes  that  will  help  us  to  better  understand  the 
simulated  annealing  heuristic.  We  will  begin  by  presenting  this  method  and  then  give  three  exam- 
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pics  of  its  use. 

For  the  template  method,  we  need  a  finite  (preferably  small)  set  S°  of  state  types,  a  set  Ac  of 
real  numbers  (corresponding  to  changes  in  cost  of  the  states),  and  a  transition  probability  matrix  R° 
from  states  in  S°  to  states  in  S°xAc  =  {(s,5):  seS°,  Ac } .  We  will  also  require  the  following  con¬ 
dition  on  R°.  Let  be  the  probability  transition  matrix  over  S°  such  that 

X  R°(t-.5)exP(-rnaxf0-5}^r) 

if  S  *  S' 

1  -  IQ«.t  if  s  =  s'. 

I** 

w 

We  assume  that  for  each  T  >  0,  the  Markov  chain,  which  has  states  S°  and  transition  probability 
matrix  Q^,  is  irreducible  and.  therefore,  ergodic.  Then  this  Markov  chain  has  a  limiting  distribu¬ 
tion  ?trr’  on  S°,  which  we  can  compute  by  solving  the  system  of  linear  equations  =  ^(T'. 

We  call  the  tnple  (S°,AC,R°)  the  template  system,  and  we  call 

Xx,™  X  5R°l.6)exp(-max(0.5}n’) 

te  S°  (j'.6)€  S®*Ac 

the  average  drift  of  the  template  system  (S^A'J*0)  for  temperature  T. 

The  following  is  one  interpretation  of  the  average  drift  of  the  template  system.  Let  a  set  of 
states  S~,  a  cost  c~  on  S",  and  a  transition  probability  matrix  R”  over  S”  be  defined  as  follows. 
The  set  S“  is  the  set  of  all  finite  sequences  of  the  form  [(So,50),  (Sj,5I),  ....  (s,,,5n)J,  where  n  >  0, 
and  s,  e  S°  and  5,  €  Ac  for  ail  i  such  that  0  5  i  S  n.  Let  a  €  S",  and  let  (s,5)  be  the  last  pair  in 
the  sequence  a.  Then  say  a  is  a  state  of  type  s.  The  cost  c“  of  a  sequence 

n 

[(s0,50).  (Si,5t),  .  .  .  ,(sn,5n)]  is  zero  if  n  =  0  and  is  if  n  ^  1.  The  matrix  R~  is  such  that 

i-i 

fR°(j.8)  if  a  is  a  state  of  type  i  and  a'=al  (j.5) 

0  otherwise, 

where  al(j,5)  is  the  sequence  formed  by  appending  (j,5)  to  the  sequence  a. 


Let  (Xk:  k£0)  be  an  annealing  process  on  the  system  (S“.c",R”)  and  with  a  temperature 
schedule  that  has  all  of  its  temperature  values  equal  to  T.  Then  it  is  easy  to  see  that 
lim  E[(c“(Xk)-c“(X0))/k]  is  equal  to  the  average  drift  of  the  template  system  for  temperature  T 

A  simple  example  of  a  template  system  (S°^C.R°)  is  when  S°  =  ( A,B).  Ac  =  {-1,1},  and  R° 
is  such  that 

1/2  if  i  =  A,  j  e  (A,  B } ,  and  8  =  1 
1/2  if  i  =  B,  j  =  A.  and  8  =  1 

ru0.5)  =  1/2  if  i  =  B.  j  =  B,  and  5  =  -1 

0  otherwise  . 

k 

A  state  diagram  of  the  system  (S“,c“,R“)  is  partially  presented  in  Figure  4.2. 1.  Each  node 
corresponds  to  a  state  of  S”  and  inside  each  node  is  the  type  of  the  state.  The  cost  c"  of  the  states 

is  indicated  on  the  left  of  the  figure.  An  arrow  from  a  state  i  to  a  state  j  indicates  that  R“  *  -j 

Another  example  of  a  template  system  is  given  by  S°  =  {0,1,2},  Ac  = 

{-2x106,-2x10‘,102,104,106,108},  and  the  matrix  R°,  which  is  defined  by 

f 

1/4  if  i=0,  (j.S)e  {(1.102),(0.104).(2.ia6).(0.108)} 
l  if  i=l,  j=0,  and  8  =  -2x1 O2 

Ri.(j.8)  =  i  if  i=2<  j=o,  and  8  =  -2xl06 

0  otherwise 

k 

The  average  drift  for  the  template  system  (S°,ACJR°)  is 

-10V10Vr+  10V1QVr  -  106e~l0*a  -i-  108e-lo,/T 

4  +  e-'^  +  e-104^ 

Note  that  this  average  drift  is  not  unimodal  in  T.  The  implication  is  that  we  cannot  guarantee  that 
an  iterative  descent  method  will  find  a  value  of  T  that  will  minimize  the  average  drift. 

From  this  result  we  can  also  conclude  that  the  average  amount  of  time  the  following  finite 
state  annealing  process  takes  to  find  an  optimal  solution  is  not  unimodal  as  a  function  of 
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temperature.  Let  S  be  the  set  of  all  pairs  of  integers  (ij)  such  that  i  e  (0,  1,  2)  and  j  e  (0,  1.  .  . 
,  lO100}.  Let  c((ij))  =  j,  and  let  R  be  a  transition  probability  matrix  R  over  S  such  that  if 
(i|Ji)  *  (i;j2)  then 


and 


1/4 


1/4 
’  1 
1 

0 


if  i,  =  0,  j2  £  lO100  and  (i2.jV>  €  {(0j,+104),  (0j,+108)) 

if  i,  =  0.  j2  <  lO100  and  (i2jj)  €  {(1  j^IO2),  (2JJ+106)} 

if  i,  =  1.  j2  2  0.  and  (i2,jz)  =  (0jt— 2X102) 

if  i,  =  2,  j2  5  0.  and  (i2,j2)  =  (0j,-2xl06) 

otherwise, 


Ro^Xmu.)  ^  .2-.  ^0iJiXi2j2)- 

(‘2J2)*('l  Jl) 

Then  the  annealing  process  (Xk:  k  >  0)  on  system  (S,  c,  R)  and  with  a  temperature  schedule  that 
has  all  of  us  values  equal  to  T  is  such  that  E[min(k  :  c(Xk)  is  of  minimum  value}  |Xo  =  (O.IO100)] 
is  well  approximated  by  lO100  divided  by  the  magnitude  of  the  average  drift  of  the  template  sys¬ 
tem  (S°,AC.R°)  with  temperature  T  if  this  average  drift  is  negative.  Hence.  E[min{k:  cCXJ  is  of 
minimum  value)  |X0  =  (O.IO100)]  is  not  a  unimodal  function  of  T,  as  stated  earlier. 


Before  presenting  the  third  and  final  example  of  a  template  system  we  give  some  useful 
definitions  Suppose  3  is  a  set,  c  is  a  cost  on  S,  and  R  is  a  transition  probability  matrix  over  S. 
We  say  that  state  i  is  reachable  at  height  E  from  state  j  if  there  is  a  sequence  of  states  j  =  i(0), 

i(  1 ) . if p)  =  i  such  that  R,ot>,fk+i)  >  0  for  0  £  k  <  p  and  c(i(k))  <  E  for  0  £  k  <  p.  State  s  is 

said  to  be  a  local  minimum  if  no  state  s'  with  els')  <  c(s)  is  reachable  from  s  at  height  c(s).  We 
define  the  depth  of  a  local  minimum  s  to  be  plus  infinity  if  s  is  a  global  minimum.  Otherwise,  the 
depth  of  s  is  the  smallest  number  E  >  0,  such  that  some  state  s',  such  that  els')  <  c(s),  can  be 
reached  from  s  at  height  c(s)  +  E. 

For  our  third  example,  let  n,  X,  0,  and  D  be  positive  integers,  where  X  <  n  +  1  and 
0  <  2  +  n  Let  S°  =  (0.1 . D),  Ac  =  {-1,1 },  and  R°  be  such  that 


.v.v.**, 


<>s 


Xyn 

if  i  <  D  and  (J.5)  =  (i+1.1) 

l/n 

if  i  >  0  and  (j,6)  =  (i-1,-1) 

1-A;n 

if  i  =  0  and  (j.6)  =  (i,0) 

1  — (X-t- 1  )/n 

if  0  <  i  <  D,  and  (j.5)  =  0.0) 

l/n 

if  i  =  D  and  (j.6)  =  (D.-l) 

$/n 

if  i  =  D  and  (j.5)  =  (D.l) 

l-<2-Hj>)/n 

if  i  =  D  and  (j.5)  =  (D.0) 

P 

otherwise. 

The  average  dnft  of  the  template  system  for  the  temperature  T  is 

_£ex2j_-lT)Xj^_  j_1_1  +  cxp(_1/T)4)] 

Xfexpf-l/DA.)1  ° 

rO 


(4.2.1) 


The  motivation  for  our  third  example  is  that  the  average  drift  is  approximately  inversely  pro¬ 
portional  to  the  average  amount  of  time  a  finite  state  annealing  process  (Xk:  k  £  0),  on  the  follow¬ 
ing  system  takes  to  find  a  minimum  valued  solution.  As  we  shall  see.  the  average  amount 

of  time  (Xk:  k  >  0)  lakes  to  find  a  minimum  valued  solution  is  dependent  on  the  density  of  states, 
density  of  states  around  local  minima,  and  the  depth  of  local  minima  of  the  annealing  process. 
Suppose  D  «  10100,  and  let  S  be  the  set  of  all  quadruples  of  integers  (i.j.djc),  where 
0  <  i  <  !  0 1  °°,  0  <,  j  <  0  £  d  <  D,  and  0  £  k  <  Xd  Let  c  be  a  cost  function  on  S  such  that 

cKi.j.d.kji  =  i  +  d  Let  G  be  any  graph  with  node  set  S  and  edge  set  E  that  has  propenies  to  be 
specified  later  Let  R  be  a  symmetric  transition  probability  matrix  on  S  such  that 


R, 


=  l/n  if  s  *  s'  and  [s,s']  e  E 

0  if  s  *  s'  and  (s,s')  4  E 

=  1  -  IR|{  if  s  =  s' 


We  will  now  present  the  properties  of  G.  Set  S  can  be  partitioned  into  subsets  T(i,j)  of 
nodes  for  i  and  j  such  that  0  S  i  <  10100  and  0  <  j  <  $>'.  where  T(i.j)  =  j(i.j.d.k):  0  <  d  <  D  and  0 
<  k  <  >.J[.  The  subgraph  of  G  induced  by  subset  T(i,j)  is  a  completely-balanced.  Xary  tree  of 


height  D  and  has  root  (i.j.0,0),  (see  [18)  for  the  definition  of  a  tree,  a  root  of  a  tree,  a  leaf  of  a 


tree,  and  the  distance  between  nodes  on  a  tree).  Also,  for  each  i  and  j,  all  nodes  (ij,d,k)  are  at 
distance  d  away  from  root  (i,j,0,0).  Note  that  c((i,j,d,k))  =  d  +  c( f i.j.0,0) ).  In  Figure  4.2  2,  an 
example  of  a  subgraph  induced  by  T(ij)  is  given  as  well  as  the  cost  of  the  nodes.  For  all  i  and  j, 
the  only  nodes  of  T(i,j)  that  have  neighboring  nodes  not  in  Tfi.j)  are  the  leaves  of  the  subgraph 
induced  by  Tfi.j).  Note  that  each  node  (i.j.D.k)  is  a  leaf  of  the  tree  induced  by  subset  T(ij)  If  1  < 
lO100  (respectively,  i  =  lO100)  then  the  nodes  of  the  set  {(i+lj.DJc):  A.(j-1)  <.  j  <  Aj)  (respectively, 
0)  are  the  only  nodes  that  are  neighbors  of  leaf  (t.j.DJc)  such  that  each  node  is  in  a  subset  of  T(ij) 
for  some  i  >  i  and  (i  j)  *  (i,j).  Hence,  the  edges  of  E  are  such  that  the  leaves  of  the  tree  induced 
by  Tfi.j)  are  paired  with  the  leaves  of  the  tree  induced  by  T(i+l,j)  where  j  is  such  that  X(j-l)  <  j  < 
/.j  Note  that  the  graph 


induced  by  contracting  subset  T(ij)  into  a  node  for  each  i,  j  is  a  completely  balanced  4>ary  tree. 
The  edges  between  the  induced  trees  T(i.j)  are  illustrated  in  Figure  4.2.3. 

Let  (Xk.  k  £  0)  be  the  annealing  process  on  system  (S,c,R)  with  temperature  schedule 
fTk:  k  £  0)  such  that  Tk  =  T  Thus  implies  that  if  (4.2.1)  is  negative  then  the  average  amount  of 
ume  it  takes  the  annealing  process  to  find  a  minimum  valued  solution  given  X0  =  ( 10IO° ,0,0,0)  is 
approximate^  1()100  divided  by  the  magnitude  of  (4.2.1).  as  we  stated  earlier.  Note  that  $  is  the 
rate  at  which  the  density  of  states  of  (Xk:  k  £  0)  is  increasing  with  cost.  D  is  the  depth  of  all  the 
local  minima,  and  \  is  the  rate  at  which  the  density  of  states  close  to  a  local  minima  is  increasing 


widi  cost 


To  keep  (4.2.1)  small  one  can  set  exp(-l/T)  = 


Then  (4.2.1)  becomes 


<£>° 

2" 

j-o  24> 


(4.2.2) 


If  <j>  «  X  then  (4.2.2)  is  approximately  — .  If  $  »  X  then  (4.2.2)  is  approximately  - 

2n  2n 

Hence,  if  <{>  »  X,  the  value  of  D  strongly  influences  the  average  amount  of  time  (Xk:  k  £  0)  takes 
to  find  a  minimum  valued  state.  If  »  X.  then  the  average  drift  goes  to  zero  exponentially  as  D 
increases  and,  therefore,  the  average  amount  of  time  (Xk:  k  £  0)  takes  to  find  a  minimum  valued 
state  grows  exponentially  as  D  increases. 


We  believe  that  for  typical  annealing  processes,  states  around  local  minima  that  have  cost 
much  higher  than  the  cost  of  the  global  minima  corresponds  to  the  situation  where  \  If  this  is 
true  then  the  depth  of  local  minima  will  not  be  so  important  to  the  dnft  of  the  annealing  process 
until  the  process  is  in  a  near  optimal  state. 


4.3.  The  Threshold  Random  Search  Algorithm 

The  threshold  random  search  algorithm  is  used  to  solve  combinatorial  optimization  problems: 

min{c(s):  s  e  S } . 

where  S  is  a  finite  set  of  states  and  c  is  a  cost  function  of  S.  Just  as  with  simulated  annealing,  the 
threshold  random  search  algorithm  requires  a  transition  probability  matrix  R  ever  S.  In  addition,  a 
sequence  of  posiuve  random  variables  (l*:  k£0),  we  call  the  threshold  schedule,  is  needed.  The 
threshold  random  search  algorithm  generates  a  sequence  of  states  (Xk:  k20)  as  follows  An  initial 


state  X0  is  chosen  from  S  Given  that  Xk  =  s.  a  potential  new  state  Yk  is  chosen  from  S  with  pro- 
babilits  distribution  P(Yk  =  s'  j  Xk  =  s]  =  R„  Then  we  set 


JYk  if  c(Yk)  -  c(Xk)  £  tk 
-  |xk  otherwise. 

If  tk  =  -Tklog(Ut),  where  (Tk:  k>0)  is  a  temperature  schedule  and  (Uk:  k>0)  is  a  sequence  of 
independent  random  variables  distributed  uniformly  over  the  interval  (0. 1  ]  then 

R(J  exp(-max{0,c(j)  -  c(i)}/Tk)  ifi*j 
1  -  £  pfxk+i  =  h|Xk  =  i]  otherwise. 

Thus,  for  this  threshold  schedule  the  threshold  random  search  algorithm  is  equivalent  to  simulated 
annealing. 

Let  (Xkl)  :  k  2  0)  be  the  process  generated  by  the  threshold  random  search  algorithm  with 
threshold  schedule  t.  It  can  be  shown  that  there  exists  a  threshold  schedule  that  minimizes 
E;minik.  c<Xkl))  £  y}].  More  generally,  as  we  shall  see  in  the  nexi  corollary,  it  can  be  shown  that 

there  exists  a  threshold  schedule  that  minimizes  F,  where  F(t)  =  Ei^f^fX^0 . X,(l>)],  for  some 

i-0 

nonnegative  functions  fl).  Note  that  if  f'lXo.  •  •  •  .xt)  =  I,c(Xj)  >  T  for  »u ,  .uch  uih  o  s  ,  s  •)  then  F(t)  = 
E;minjk.  c(Xk<;)  <  y}]. 

The  corollary  is  implied  by  the  next  proposition  and  the  following  simple  observation:  for 
every  threshold  schedule  t,  there  is  a  sample  path  t'  of  t  such  that  F(t)  >  F(0-  Therefore,  when 
searching  for  threshold  schedules  that  minimize  F  we  can  restrict  our  attention  to  deterministic 
ones  Without  loss  of  generality  we  will  only  consider  deterministic  threshold  schedules  (tk:  k>0), 
■shore  tk  €  A,  for  all  k  >  0.  and  A  =  |c<j)  -  c(i):  i.j  e  S.  RtJ  >  0.  and  c(j)  >  c(i))^j(0).  Let  A  be 
the  set  of  such  kinds  of  deterministic  threshold  schedules. 


P[Xk*,  =  j  I  Xk  =  i]  = 


Proposition  4  3  1  The  functional  F  is  minimized  on  A 


'3 


Proof:  It  is  straightforward  to  show  that  A  is  compact  with  respect  to  the  distance  metric 

5((tk:  k>0),(sk:  k>0))  =  X2‘*|tk-sk|. 

k20 

We  will  now  show  that  F  is  lower  semicontinuous  on  A,  which  will  complete  the  proof.  Since  A 

is  finite  and  Eff^CX^,  X[l) . X|P)]  depends  only  on  the  first  k  +  1  elements  of  the  sequence  t, 

E[f<k)(xP.X1(t) . XP)]  is  continuous  on  A.  Thus,  F  is  lower  semicontinuous  because  it  is  the  sum 

of  a  collection  of  nonnegative  continuous  functions. 

□ 

Corollary  4.3.1 :  There  is  a  deterministic  threshold  schedule  that  minimizes  F. 

Remark :  A  simulated  annealing  algorithm  will  typically  be  outperformed  by  a  threshold  ran¬ 
dom  search  algorithm  with  some  deterministic  threshold  schedule.  However,  this  is  only  of 
theoretical  interest,  at  this  point,  since  the  problem  of  finding  optimal  threshold  schedules  is 
difficult. 


4.4.  The  Nonmonotonicity  of  Optimal  Temperature  Schedules 

Most  analytical  studies  of  simulated  annealing  consider  only  monotone  decreasing  tempera¬ 
ture  schedules.  This  is  not  surprising,  since  the  purpose  of  the  heuristic  is  to  simulate  an  "anneal¬ 
ing"  process.  In  this  section,  we  present  sets  of  conditions  under  which  no  monotone  decreasing 
temperature  schedule  is  optimal.  We  will  focus  on  the  basic  simulated  annealing  algorithm  in  Sub¬ 
section  2.1.2,  which  is  used  to  solve  the  matching  problem  for  a  graph  G.  Let  (S,  c,  R)  be  as  in 
Subsection  2.1.2,  and  let  (X^:  k>0)  be  the  annealing  process  with  temperature  schedule  T,  where 
X,)T'  =  0.  The  following  proposition  contains  the  main  results  of  this  section. 

Proposition  4.4.1 :  Suppose  there  is  a  maximal  but  not  maximum  matching  M  of  G  such  that 


i M  |  <  m.  Then 
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(a)  there  is  a  temperature  schedule  T  that  minimizes 

E(min{k:  |Xkm(>m}], 

and  for  every  J  >  0  there  is  temperature  schedule  T  that  minimizes 

P[|xr>|>m]; 


(4.4.1) 


(4.4.2) 


(b)  there  is  a  monotone  decreasing  temperature  schedule  T  that  minimizes  (4.4.1)  if  and  only  if 
the  infinity-valued  temperature  schedule  (i.e.,  Tk  =  »  for  all  k)  minimizes  (4.4.1);  and  (c)  there  is 
a  J  such  that  if  J  >  J  then  there  is  no  monotone  decreasing  temperature  schedule  that  minimizes 
(4.4.2). 

Remark:  If  G  is  a  single  path  consisting  of  2n+l  nodes,  m  =  0.9n  and  n  is  sufficiently  large, 
then  the  infinity-valued  temperature  schedule  does  not  minimize  (4.4.1).  To  see  this,  observe  that 
by  Theorem  2.3.1  there  is  a  temperature  schedule  such  that  (4.4.1)  is  0(n5).  Now  suppose  that  T 
is  the  infinity-valued  temperature  schedule.  If  a  matching  M  of  G  is  such  that  |M|  >  0.75n,  then 
the  number  of  matchable  edges  relative  to  M  is  at  most  0.5n.  The  implication  is  that  the  ratio  of 
|M|  over  the  number  of  matchable  edges  relative  to  M  is  at  least  0.6  and,  hence,  since  T  is  the 
infinity-valued  temperature  schedule,  P[  |Xmin(k  >  xk  *  Xj)  |<|M  1 1  X,=M]  >  0.6.  Then  it  is  straight¬ 
forward  to  show  that  (4.4.1)  grows  exponentially  with  n.  Therefore,  for  this  graph  G,  none  of  the 
optimal  temperature  schedules  are  monotone  decreasing  if  n  is  large  enough. 

To  prove  the  proposition  we  will  use  the  next  lemma,  which  may  be  interesting  in  its  own 
right.  In  the  lemma  we  refer  to  a  process  (X^:  k  £  0),  which  is  a  more  general  form  of  the  pro¬ 
cess  (Xjf1^:  k  >  0).  We  also  refer  to  a  functional  F(T)  of  temperature  schedule  T,  where  F(T)  = 

■  ■  .  ,Xjn))]  and  is  nonnegative.  If  (X^:  k  >  0),  is  equal  to  (X^:  k  £  0)  then 
both  (4.4.1 )  and  (4.4.2)  have  the  form  of  F(T). 

Lemma  4  4  1:  Let  S  be  a  set  of  states,  let  t  be  a  cost  on  that  set,  and  let  R  be  a  transition 


a:nx  over  S  such  that  if  >  0  and  2(j)  >  t(i)  then  5(j)  -  t(i)  =1.  Let  (X|P  :  k  £  0) 
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be  the  annealing  process  on  (S,  t,  R),  with  temperature  schedule  T.  Let  F(T)  = 

oo 

ElX^fXo"^,  .  .  .  .xf^J,  where  f®  is  nonnegative.  Then  there  is  a  temperature  schedule  that 
>=0 

minimizes  F  and  is  monotone  decreasing  if  and  only  if  there  is  a  temperature  schedule  (Tk:  k  >  0) 
that  minimizes  F,  is  monotone  decreasing,  and  Tke  {0, <»}  for  all  k  £  0. 

Remark:  If  T  is  a  zero-infinity  valued  temperature  schedule,  the  annealing  process 
(Xkm  :  k  >  0)  then  there  is  a  random  threshold  search  algorithm  with  a  deterministic  threshold 
schedule  that  can  produce  an  equivalent  process. 

Proof  of  Lemma  4.4.1:  The  "if’  part  of  the  lemma  is  immediate.  We  now  turn  to  prove  the 
"only  if'  part.  Let  T  =  (Tk:  k  >  0)  be  an  optimal  temperature  schedule  that  is  monotone  decreas¬ 
ing.  For  each  j  >  -1,  we  will  define  a  temperature  schedule  T®  =  (Tp:  k>0)  inductively  as  fol¬ 


lows.  Let  T*  =  T.  For  each  j  >  0, 


Tv®  = 


for  all  k  >  0. 


0  if  k  =  j  and  Tj®1*  < 
Tk<H>  otherwise. 


We  will  now  show  that  F(T®)  =  FCT®1)),  for  j  >  0,  by  the  following  inductive  argument. 

>  J 

Suppose  T®!)  minimizes  F  and  T®1^.  Note  that 

r* 

V; 

F(T®^)  =  X  •  •  •  -  XifIthl,))|x/ni'1>)  =  a,  =  p] 

'■  S  i=0 

X  =  |3  Ixf^  =  o]Ptxi<^">  =  a], 

^  and  the  only  terms  in  the  expression  that  depend  on  T®!)  are  PtX^1^  =  P  |  xf^'^  =  a]  for 

^  a,p€  S.  Since  Ry  >  0  and  c(j)  >  c(i)  imply  c(j)  -  c(i)  =  1,  PIX^'^  =  p  |  X^lf*  =  a]  equals  one 

or  expC-l/T®1^.  Hence,  FCT®1*)  has  the  form  A'  +  B'expC-l/T®1*),  where  A'  an  1  B'  are  terms 

*  ** 

not  dependent  on  T®^.  Then  it  must  be  that  B'  >  0,  because  T®])  minimizes  F  and  T®1)o> 

* 


Therefore,  FCT^  =  A'  <  A'  +  B'exp^l/Tj0-1*)  =  FCT^1*)  for  all  j  >  0.  By  the  optimality  of  T, 
F (T0*)  =  F(T)  for  all  j  >  0. 

Let  T<~)  =  (Tk“*:  k>0)  be  such  that  Tk(“)  =  °°,  if  Tk  =  »,  and  =  0,  otherwise.  Using  an 
argument  similar  to  the  one  used  in  the  proof  of  Proposition  4.3.1,  one  can  show  that  F  is  lower 
semicontinuous  on  the  set  of  all  temperature  schedules  under  the  distance  metric 

oo 

5(t,s)  =  £2~‘|exp(-l/tj)  -  exp(— l/sL)  I .  Since  T'“)  is  the  pointwise  limit  of  T(i)  and  F(T)  =  F(T®) 

i=0 

for  all  positive  j,  =  F(T).  Thus,  T(“)  is  optimal,  zero-infinity  valued,  and  monotone 

decreasing,  and  we  are  done. 

□ 

Proof  of  Proposition  4.4.1.  Part  (a)  follows  from  our  argument  in  the  proof  of  Lemma  4.4.1  that  F 
is  lower  semicontinuous  on  the  set  of  all  temperature  schedules,  and  the  fact  that  the  set  of  all  tem¬ 
perature  schedules  is  compact  under  the  distance  metric  5. 

We  now  turn  to  prove  part  (b).  The  fact  that  R  is  symmetric  and,  for  all  i,  j  e  S,  there  is  a 
sequence  of  matchings,  i  =  i(l),  i(l),  ....  i(k)  =  j,  such  that  Ri(h>i(h+i)  >  0  for  all  h  such  that  1  <  h 
<  k,  implies  that  the  infinity-valued  temperature  schedule  leads  to  a  finite  value  for  F.  Write  M  = 

(e^  &2’  ■  ■  •  ,  en)*  30(1  let  Mo  -  0.  and  Mj  =  (ej,  % . e^  for  all  i>0.  Since  =  0,  for  all  i 

such  that  1  <  i  <  n,  RM|_|M.  >  0,  RMim^,  >  0,  and  iM^jl  >  |M,|,  then,  for  any  temperature 
schedule  T,  P[X|[^e  (Mq,  .  .  .  ,Mn}]  >  0  for  all  k  £  0.  Hence,  F  is  infinite  for  all  monotone 
decreasing  zero-infinity  valued  temperature  schedules,  with  the  exception  of  the  infinity-valued 
temperature  schedule.  We  can  then  conclude  that,  the  only  monotone  decreasing  zero-infinity 
valued  temperature  schedule  that  can  possibly  be  optimal  is  the  infinity-valued  one.  The  previous 
conclusion  and  Lemma  4.4.1  imply  there  is  a  temperature  schedule  T  that  minimizes  (4.4.1)  and  is 


monotone  decreasing  if  and  only  if  the  infinity-valued  temperature  schedule  minimizes  (4.4.1). 
Thus,  part  (b)  is  proved. 

We  will  now  turn  to  prove  part  (c).  Since  R  is  symmetric  and  for  all  i,  j  e  S,  there  is  a 
sequence  of  matchings,  i  =  i(l),  i(l),  ....  i(k)  =  j,  such  that  Ri(h>i(h+i)  >  0  for  all  h  such  that  1  £  h 
<  k,  we  know  that 

lim  P[Xkm6  (M0 . MJ] 

k— *oo 

exists  and  is  strictly  positive  if  T  is  the  infinity-valued  temperature  schedule.  Then  there  is  a  J 
such  that 

inf  P[Xjme  {Mq . M„}]  >  0.  (4.4.3) 

Inequality  (4.4.3),  the  fact  that  M  is  maximal,  and  the  fact  that  RMm  M>0  and  IM^I  >  |M{  | ,  for 

all  i  such  that  1  <  i  <  n,  imply  that  there  exists  an  e  >  0  such  that  inf  P[|X/T)|<m]>e  for  any 

i 

monotone  decreasing  zero-infinity  valued  temperature  schedule  T.  However,  Geman  and  Geman 
[4],  Hajek  [13],  among  others,  have  shown  that  there  is  a  temperature  schedule  T  such  that 

Um  PtlX^I  <  m]  =  0. 

Hence,  there  is  a  J  such  that  if  J  >  J  then  no  monotone  decreasing  zero-infinity  valued  temperature 
schedule  minimizes  (4.4.2).  Then  Lemma  4.4.1  implies  that  there  are  no  monotone  decreasing 
temperature  schedules  that  are  optimal.  Part  (c)  is  now  proved  and  we  are  completely  done  with 
the  proof  of  the  proposition. 


CHAPTER  5 


OPTIMIZATION  WITH  EQUALITY  CONSTRAINTS 

5.1.  Optimization  with  Equality  Constraints 

In  this  chapter,  we  consider  solving  the  following  equality  constrained  problem  (ECP)  by 
simulated  annealing. 

(ECP)  minimize  f(x) 

subject  to  x  e  X,  h(x)  =  0, 

where  X  is  the  set  of  solutions,  f:  X-*R,  and  h(x)  =  (h^x),  h2(x) . hk(x))T  for  some  k  £  1. 

We  can  also  include  inequality  constraints  in  this  form,  since  g(x)  <  0  can  be  written  as 
max{0,g(x)}  =  0.  A  solution  x  e  X  is  called  feasible  if  h(x)  =  0. 

As  an  example,  the  following  optimization  version  of  the  Graph  Partitioning  Problem  f  19] 

can  be  written  as  an  equality  constrained  problem.  An  instance  of  this  problem  is  a  graph  (V,E),  a 

K 

positive  integer  K,  and  a  set  of  positive  integers  .  .  .  ,oK  such  that  £  °i  =  |V|.  We  call 

t*l 

(V i,V2,  .  .  .  ,  VK)  a  partition  of  V  if  each  V,  is  a  subset  of  V,  the  are  disjoint,  and  their  union  is 
V.  The  capacity  of  the  partition  (Vj, . .  .  .V^)  is  the  number  of  edges  [u.v]  such  that  if  u  e  V; 
and  v  e  Vj  then  i  *  j.  The  problem  is  to  find  a  partition  (V,,  ....  V^)  with  minimum  capacity 
and  such  that  jVj  -  O;  =  0  for  i  =  1,  2 . K. 

Another  optimization  problem  that  can  be  written  as  an  equality  constrained  problem  is  the 
optimization  version  of  the  Minimum  Cut  Linear  Problem  [19].  An  instance  for  this  problem  is  a 
graph  (V,E).  The  problem  is  to  find  a  mapping  Jt:  V— >{1,2 . | V j }  that  will  minimize, 


max  I  l(u,v)  e  E:  rt(u)  <  i  <  Jt(v)}  | 
i<a<|v| 

subject  to  |  { v  e  V:  n(v)  =  j)  |  -  1  =  0  for  j  =  1,  2 . |V|. 

Penalty  methods  are  techniques  used  to  solve  equality  constrained  problems.  The  basic  idea 
of  the  methods  is  to  substitute  some  or  all  of  the  equality  constraints  by  adding  to  the  cost  function 
penalty  terms  that  give  a  high  cost  to  infeasible  points.  In  this  section,  we  will  focus  on  the  qua¬ 
dratic  penalty  method,  which  we  will  describe  next,  and  variations  of  it.  For  any  scalar  c,  let  us 
define  the  augmented  Lagrangian  function 

Lc(xA)  =  f(x)  +  XTh(x)  +  j  |h(x)|2. 

We  refer  to  c  as  the  penalty  parameter  and  to  X  as  the  multiplier  vector  (or  simply  multiplier). 
The  quadratic  penalty  method  consists  of  solving  a  sequence  of  problems  of  the  form 

minfL^x^):  xeX), 

where  (Xk:  k  >  0)  is  a  bounded  sequence  and  (ck:  k>0)  is  a  penalty  parameter  sequence  such  that  0 
<  ck  <  ck+i  for  all  k  >  0.  and  ck— **>.  For  many  applications,  =  0  for  all  k  £  0.  In  this  section, 
we  consider  using  a  simulated  annealing  algorithm  to  minimize  the  augmented  Lagrangian  function 
for  each  ck.  Since  the  running  time  of  simulated  annealing  is  typically  very  long,  we  will  only 
consider  LCk(-,Xk)  for  k  =  0. 

Aragon  et  al.  [20]  demonstrated  that  the  quality  of  solutions  produced  by  the  quadratic 
penalty  method  in  conjunction  with  simulated  annealing  may  be  sensitive  to  the  value  of  the 
penalty  parameter.  In  their  experiments,  very  large  values  of  Co  resulted  in  poor  solutions,  prob¬ 
ably,  because  the  annealing  process  was  greatly  restricted  to  what  state  it  could  move  to.  For  very 
small  values  of  c0,  the  final  solution  of  the  simulated  annealing  algorithm  was  far  from  being  feasi¬ 
ble,  and  their  greedy  fix-up  algorithm  was  not  effective  enough  to  produce  good  solutions.  To  find 
a  good  parameter  value  experimentally  may  be  impractical,  since  simulated  annealing  is  typically 
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very  time  consuming.  Hence,  it  is  worthwhile  to  find  penalty  methods  that  perform  well  over  a 
wide  range  of  parameter  values.  We  will  introduce  adaptive  penalty  methods,  which  may  be 
penalty  methods  of  this  type. 

For  the  rest  of  this  section,  we  will  describe  two  adaptive  penalty  methods  and  the  following 
simple  hardlimiter  penalty  method.  In  Section  5.2,  we  compare  the  penalty  methods  by  computer 
experiments.  For  the  hardlimiter  penalty  method,  we  find  an  x  e  X  that  will  minimize  f(x)  + 
p(x,7),  where 


p(x.Y)  = 


0  if  |xj<y 


otherwise. 


and  y  is  a  nonnegative  scalar  parameter.  Note  that  if  y  =  0  then  minimizing  f(x)  +  p(x,Y)  is 
equivalent  to  solving  the  equality  constrained  problem  (ECP). 

The  two  adaptive  penalty  methods  we  consider  are  similar  to  the  quadratic  penalty  method, 
because  they  both  involve  minimizing  an  augmented  Lagrangian  function,  but  in  the  adaptive 
methods  the  multiplier  is  adaptively  and  periodically  adjusted. 

The  first  adaptive  penalty  method  is  inspired  by  the  method  of  multipliers  (see  [21]),  which  is 
used  in  nonlinear  programming,  and  the  second  method  is  a  slight  modification  of  the  first.  The 
procedure  of  the  method  of  multipliers  is  as  follows.  Let  (Cj:  j  >  0)  be  a  positive  monotone 
increasing  sequence.  For  each  j,  solve 

minimize  Le.(x,Xj) 
subject  to  x  e  X, 


and  if  Xj  is  an  optimal  solution  then 


Vi  =  \  +  Cjh(Xj). 
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Note  that  in  practice,  in  place  of  xy  we  use  the  solution  \}  found  by  the  simulated  annealing  algo¬ 
rithm,  which  is  generally  not  optimal. 

In  nonlinear  programming  applications  the  method  of  multipliers  has  advantages  over  the 
quadratic  penalty  method.  For  example,  suppose  that  for  the  equality  constrained  problem  (ECP) 
X  =  Rn,  f:  Rn— »R,  h:  Rn~»Rm,  and  f,  ge  C2.  In  addition,  suppose  the  following  assumption  holds. 

Assumption  (S)  [21]:  There  is  a  solution  x*  that  is  a  strict  local  minimum  and  a  regular  point 
of  ECP,  and  f.he  C2  on  some  open  sphere  centered  at  x*.  Furthermore,  x*  together  with  its  asso¬ 
ciated  Lagrange  multiplier  vector  X’  satisfies 

zTV^(f(x)  +  X*h(x))z  >  0, 

for  all  z  *  0  with  Vh(x‘)Tz  =  0. 

From  Proposition  2.4  of  [21],  for  any  value  of  Xq,  there  exists  a  c  >  0  such  that  if  Cj  £  c  then 
Xj— >x*  and  Xj  — »  X*.  where  x*  is  a  locally  optimal  solution  of  equality  constrained  problem  and  X* 
is  its  corresponding  Lagrange  multiplier.  However,  if  we  fix  Xj  =  0  for  all  j  (corresponding  to  the 
quadratic  penalty  function  method)  then  to  insure  that  Xj  converges  to  a  locally  optimal  solution  of 
the  equality  constrained  problem  we  must  have  Cj  —»  °°.  This  suggests  that  the  method  of  multi¬ 
pliers  may  be  less  sensitive  to  the  values  of  Cj.  However,  Cj  must  still  be  sufficiently  large.  For 
example,  suppose  f(x)  =  -x2,  h(x)  =  x,  and  X  =  [-1,1],  If  c  <  1  then  the  points  in  X  that  minimize 
Lc(x,X),  for  any  value  of  X  are  in  [-1,1],  and  both  -1  and  1  are  infeasible  solutions. 

For  the  method  of  multipliers,  it  may  be  necessary  to  minimize  a  number  of  augmented 
Lagrangian  functions.  Since  we  intend  to  do  the  minimization  by  simulated  annealing  and  simu¬ 
lated  annealing,  typically,  takes  a  long  time,  this  penalty  method  may  be  impractical.  This 
motivates  our  next  penalty  method,  which  we  call  the  dynamic  method  of  multipliers.  In  this 
penalty  method,  simulated  annealing  is  used  only  once  to  find  the  minimum  of  Lc(x,X),  and  the 


multiplier  X  is  updated  every  Z  iterations  during  this  run.  The  update  rule  for  X  is 
X^v*  =  Xqjj  +  ch,  where  h  is  the  average  value  of  h(x)  observed  since  the  last  time  X  was  updated. 
Note  that  X  is  updated  so  that  h(x)  will  drift  in  the  direction  of  zero.  Hence,  even  if  LC(.,X)  is  con¬ 
cave  we  have  some  hope  that  the  final  solution  will  be  close  to  being  feasible. 

We  list  three  disadvantages  of  this  method:  (1)  there  is  the  additional  complexity  of  choosing 
parameter  Z;  (2)  if  Z  is  small  then  updating  X  could  be  very  time  consuming;  and  (3)  computing 
h  may  be  time  consuming  if  the  number  of  equality  constraints  is  large.  We  will  set  Z  equal  to  the 
maximum  number  of  neighbors  a  state  in  the  annealing  process  has,  which  should  be  a  large 
number.  This  is  a  somewhat  arbitrary  choice  for  Z  and  is  not  a  general  recommendation.  Note 
that  we  can  set  Z  to  be  sufficiently  large  so  that  the  dynamic  method  of  multipliers  reduces  to  the 
quadratic  penalty  method.  Since  our  value  of  Z  will  typically  be  a  large  value,  the  dynamic 
method  of  multipliers  will  be,  in  a  sense,  a  perturbation  of  the  quadratic  penalty  method. 

The  third  disadvantage  can  be  eliminated  in  many  cases  if  is  updated  only  when  h,(x) 
changes  value  or  when  Xt  needs  to  be  updated. 

5.2.  Experimental  Results 

In  order  to  compare  the  different  penalty  methods  of  the  previous  section  we  applied  them  to 
solve  the  optimization  form  of  the  Graph  Partitioning  Problem  and  simulated  annealing  was  used  to 
perform  the  optimization.  Then  computer  experiments  were  done  to  compare  the  performance  of 
the  methods.  We  will  first  discuss  the  implementation  of  each  penalty  method.  Then  we  will 
present  and  discuss  the  experimental  results. 

The  basic  form  of  our  implementation  of  simulated  annealing  for  the  penalty  methods  is 
shown  in  Figure  5.2.1  (p.  84).  The  algorithm  in  this  figure,  as  well  as  all  other  algorithms 


presented  in  this  section,  is  written  in  pidgin  Algol.  We  refer  the  reader  to  [10]  for  more  details  of 


pidgin  Algol. 


The  input  to  the  algorithm  is  a  graph  (V£),  an  integer  K,  and  positive  integers 
K 

Op  o2 . ok  such  that  S  °i  =  lvl-  F°r  instance  of  the  problem,  the  set  of  states  is  the  set 

i-i 

of  partitions  of  V.  If  the  quadratic  penalty  method,  the  method  of  multipliers,  or  the  dynamic 
method  of  multipliers  is  used  then  the  cost  of  partition  (V, . VK)  is  its  capacity  plus 

£*,«>,-  |V,|)  +  |V,|)2. 

1-1  1  i-i 

If  the  hardlimiter  penalty  method  is  used  then  the  cost  of  a  partition  (V, . VK)  is  its  capacity 

K 

plus  £  pCOj-IVj.Y).  Two  partitions  (V,, .  .  .  ,  VjJ  and  (V,.  .  .  .  ,  VK)  are  neighbors  if 

i-i 

K 

IIIV.I-IV.I  |=2. 

i-i 

The  main  section  of  this  algorithm  is  the  while  loop,  which  simulates  the  annealing  process, 
and  the  fundamental  procedure  within  the  while  loop  is  move( ).  Procedure  move(  )  chooses  a  ran¬ 
dom  neighbor  of  the  current  partition,  each  neighbor  being  equally  likely  to  be  chosen.  If  the  cost 
of  the  neighbor  is  at  most  the  cost  of  the  current  partition  then  the  neighbor  is  accepted  as  the  new 
current  partition.  Otherwise,  it  accepts  this  neighbor  as  the  new  current  partition  with  probability 
exp(-  A/T),  and  with  probability  1  -  exp(-  A/T)  it  leaves  the  current  partition  as  is,  where  A  equals 
the  cost  of  the  neighbor  minus  the  cost  of  the  current  partition.  The  temperature  parameter  T  is 
decreased,  by  multiplying  it  by  TFACTOR,  and  it  is  decreased  after  every  MCLENGTH  calls  to 
move(  ).  For  our  experiments,  the  value  of  MCLENGTH  was  chosen  to  be  the  number  of  neigh¬ 
bors  a  partition  has  (=(K-1)|V|).  This  value  of  MCLENGTH  was  used  in  [22], 

After  the  algorithm  decreases  T,  it  checks  to  see  if  it  ihould  stop  simulating  the  annealing 
process.  Variable  numaccept  stores  the  number  of  times  neighbors  were  accepted  as  new  current 
partitions  since  the  last  time  T  was  decreased.  If  numaccept  equals  zero  then  the  annealing  process 


BASIC  SIMULATED  ANNEALING  ALGORITHM  FOR 
GRAPH  PARTITIONING 


Input:  Graph  (VJ-),  positive  integer  K,  positive  integers  Oj,  o2 . °K- 

Output:  A  feasible  partition  of  V  with  small  capacity. 


begin 

(Vi . Vk)  :=init_state  (  ),  T  :=  init_temperature  (  ); 

numaccept  :=  1,  numtemp  :=  0; 
while  (numaccept  >  0)  and  (numtemp  <  R )  (Jq 
begin 

numaccept  :=  0,  numtemp  :=  numtemp  +1; 
for  i  :=  1,  2  , ...,  MCLENGTH  £[2  move  (  ); 

T  :=  TF ACTOR  *  T; 
end 

if  for  some  i  I then  (Vt,  ....  VK)  :=  greedy_fix_up  (V, . V^); 


procedure  move  (  ) 
begin 

(V, . VK)  :=  random_neighbor  (Vi,  VK); 

A  :=  cost  (V . VK)  -  cost  (V,,  ....  VK); 

if  A  £  0  then  (V{, ...,  V^)  :=  (V^  ....  VK),  numaccept  :=  numaccept  +1; 
else  if  randunit  (  )  S  exp  (-  A/T) 

then  (V,,  ....  VK)  :=  (V,, ....  VK),  numaccept  :=  numaccept  +1; 
end 

function  randunit  (  ) 

begin  return  (a  random  number  uniformly  distributed  on  the  interval  [0,  1]) 
end 


Figure  5.2.1.  The  basic  implementation  of  the  penalty  method  for  graph  partitioning,  where  the 
minimization  is  done  by  simulated  annealing. 


is  assumed  to  be  "frozen"  and  the  simulation  is  stopped.  Variable  numtemp  stores  the  number  of 


umes  T  is  decreased  If  numtemp  exceeds  a  number  R  then  the  simulation  is  also  terminated. 

Lruualization  of  the  algorithm  consists  of  initializing  numtemp  and  numaccept,  and  calling 
irui_sta:e< )  and  irut_temperatu,re()  Function  init_state()  returns  a  feasible  partition  of  V  where  the 
a, mealing  process  can  sun  from.  Function  init_temperature  returns  an  initial  value  for  T.  The 
returned  value  is  computed  by  randomly  picking  one  hundred  neighbors  of  an  arbitrary  feasible 

partition  (V,,  ....  VK),  and.  for  all  neighbors  that  have  bigger  cost  than  (V, . V*),  the  sample 

average  A  of  the  cost  of  a  neighbor  of  (V, . minus  the  cost  of  (Vj . V*)  is  com¬ 

puted  The  temperature  T  that  is  returned  is  such  that  expf-A/T)  =  0.4.  This  method  of  initializ¬ 
ing  the  temperature  value  was  also  used  by  Aragon  et  al.  [20]  and  Kirkpatrick  et  al.  [1]. 

After  the  simulation  of  the  annealing  process  the  final  partition  (Vlf  •  •  •  .V*)  computed  may 
be  infeasible  The  function  greedy Jixjup  (V •  •  ,VK)  is  called  and  it  returns  a  feasible  partition. 
It  does  this  by  sequentially  transferring  nodes  from  subsets  V,  to  subsets  Vj,  such  that  |Vj  >  a,, 
and  |  Vj  |  <  Oj,  so  that  each  transfer  minimizes  the  increase  in  the  capacity  of  the  resulting  partition. 

We  will  now  discuss  the  different  implementations  of  the  penalty  function  methods.  For  the 
hardlimiter  (HL)  and  quadratic  (Q)  penalty  methods,  the  implementation  used  is  the  one  in  Figure 
5.2.1.  The  implementation  of  the  method  of  multipliers  (MM)  is  shown  in  Figure  5.2.2.  In  this 
implementation,  the  penalty  parameters  Cj  are  constant  and  equal  to  c  and  the  augmented  Lagran- 
gian  function  is  minimized  J+l  times  by  simulated  annealing.  The  first  minimization  is  done  by  a 
long  simulated  annealing  run  and  the  next  J  minimizations  are  done  by  runs.  In  between  minimi¬ 
zations  the  multiplier  vector  X  is  updated. 

The  implementation  of  the  dynamic  method  of  multipliers  (DMM)  is  shown  in  Figure  5.2.3. 
It  is  the  same  algorithm  in  Figure  5.2.1,  excluding  the  additional  lines  enclosed  in  the  two  rectan¬ 
gles.  These  additional  lines  and  the  two  additional  variables  Ah  =  (Ah,  :  i  =  1,  2 . K)  and 


METHOD  OF  MULTIPLIERS 


1 

(V„  VK)  :=init_state  (  ),  T  :=  init_temperature  (  ); 

for  i  :=  1,  2 . .  K,  da  h  :=  0; 

numaccept  :=  1,  numtemp  :=  0; 
while  numaccept  >  0)  and  (numtemp  <  R)  dQ 
begin 

numaccept  :=  0,  numtemp  :=  numtemp  +1; 

for  j  :=  1,  2 .  MCLENGTH  da  move  (  ); 

T  :=  TFACTOR  *  T; 
end 

T  :=  (TFACTOR)~f,uratcmp/J  *  T; 

£qi  i  :=  1,  2 .  J  do 

begin 

T  :=  f; 

for  j  :=  1.  2  ,  ....  K  d2  ^  +  Xj  +  c*(Oj-lVjl); 
numaccept  :=  1,  numtemp  :=  0; 
while  (numaccept  >  0)  and  (numtemp  <  R/J  )  da 
begin 

numaccept  :=  0,  numtemp  :=  numtemp  +1; 
for  t  :=  1,  2  ,  ....  MCLENGTH  da  move  (  ); 

T  :=  TFACTOR  *  T; 
end 
end 

if  for  some  i  ,  1  VJ^o,  then  (Vlf ....  VK)  :=  greedy_fix_up  (V,,  ....  VK); 


Figure  5  J.2.  The  implementation  of  the  method  of  multipliers. 


Zcount  are  used  in  maintaining  X.  Note  that  Ah/Z  is  the  sample  mean  of  o,—  IVj  since  the  last 
time  Xj  was  updated. 

For  the  algorithm  in  Figure  5.2.3,  Aht  is  updated  after  each  call  to  move( ).  As  remarked  in 
the  previous  section,  this  can  be  very  time  consuming  if  K  is  large.  A  more  efficient  method  is  to 
update  Ah,  only  when  jV,|  changes  value.  To  accomplish  this  we  have  a  global  counter  globcount 
which  is  incremented  immediately  after  every  call  to  move(  ),  and,  for  each  i,  such  that  1  2  i  2  K 
we  have  additional  variables  lastdiff,  and  lastcount,:  lastdiff;  equals  o,  -  |V,|  and  lastcount,  equals 
the  value  of  globcount  when  Ah,  was  updated  last.  Every  time  |Vj  changes  value  or  X*  is  to  be 
updated  we  increment  Ah,  by  a,  -  |V,|  +  (globcount  -  lastcount,  -  l)lastdiff,,  set  lastdiff,  to 
c,  -  |  Vj  | ,  and  set  lastcounq  to  globcount. 

Our  experiments  consisted  of  three  cases:  (n  =  240,  m  =  300,  K  =  2,  a,  =  cj2  =  200),  (n  = 
240,  m  =  300,  K  =  2,  a,  =  160,  a2  =  80),  and  (n  =  160,  m  =  200,  K  =  4,  ~  cr2  «  a3  =  a4  =  40). 

For  each  case,  ten  graphs  were  randomly  and  independently  generated  where  each  graph  in  G(njn) 
was  equally  likely  to  be  generated.  Recall  G(n,m)  is  the  set  of  graphs  with  node  set  (1,  2,...,  n) 
and  m  edges.  For  each  case  and  parameter  value  considered,  each  algorithm  was  executed  once 
for  each  of  the  ten  random  graphs.  During  these  ten  runs  the  average  number  of  times  T  was 
decremented  (AVG#TDEC)  and  the  average  capacity  of  the  final  partition  (AVGCAP)  were 
recorded  The  number  of  times  T  was  decremented  is  related  to  the  running  time  of  the  algorithm, 
since  the  number  of  times  T  was  decremented  multiplied  by  MCLENGTH  is  equal  to  the  length  of 
die  annealing  process  simulated. 

For  all  three  cases,  the  average  degree  of  the  graph  (equal  to  2|E|/|V|)  is  2.5  and  we  com¬ 
pared  the  following  penalty  methods. 


hardlimiter  penalty  method  when  y  =1; 

quadratic  penalty  method  when  c  =  1C TM'n  for  i  =  0,  1 . 10; 
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DYNAMIC  METHOD  OF  MULTIPLIERS 


begin 

(Vj, ....  VK)  :=  init_state  (  ),  T  :=  init_temperature  (  ); 

for  i  :=  1.  2 .  K  dfi  X*  :=  0,  Ah*  :=  0  ; 

Zcount  :=  Z; _ 

num accept  :=  1,  numtemp  :=  0; 
while  (num accept  >  0)  and  (numtemp  <  R)  dfi 
begin 

numaccept  :=  0,  numtemp  :=  numtemp  +1; 

for  i  :=  1.  2 .  MCLENGTH  dfi 

begin 

move  (  ) 

for  j  :=  1.  2  . ....  dfl  Ahj  :=  Ahj  +  Oj  -  I  Vj  I ; 

Zcount  :=  Zcount  -1 
if  Zcount  =  0  then 
begin 

for  j  :=  1  , ....  K  dfl  +  cAhj/Z,  Ahj  :=  0  ; 
Zcount  :=  Z; 

end _ 

end 

T  :=  TFACTOR  *  T; 
end 

if  for  some  i  then  (V; . VK)  :=  greedy_fix_up  (V, . VK); 

end 


Figure  5.2  J.  The  implementation  of  the  dynamic  method  of  multipliers. 


method  of  multipliers  when  J  =  4  and  c  =  KT4**72  for  i  =  0, 1, ....  10; 

dynamic  method  of  multipliers  when  Z  =  (K-1)|V|  and  c  =  for  i  =  0, 1 . 10. 

For  the  hardlimiter  penalty  method,  the  quadratic  penalty  method,  and  the  dynamic  method  of 
multipliers,  we  let  TF ACTOR  equal  0.9.  However,  we  let  TFACTOR  equal  0.81  for  the  method 
of  multipliers,  since  it  consists  of  multiple  simulated  annealing  runs.  The  result  was  that  the  sum 
of  the  lengths  of  the  annealing  processes  simulated  were  roughly  the  same  for  all  the  penalty 
methods.  We  will  now  discuss  some  of  the  details  of  each  case. 

Case  1  (n  =  240,  m  =  300,  K  =  2,  =  a2  =  120):  In  this  case,  the  resulting  partition  is 

required  to  be  balanced.  For  all  penalty  methods,  we  ran  all  the  algorithms  with  R  equal  to  250. 
We  also  ran  the  algorithm  for  the  hardlimiter  penalty  method  for  R  equal  to  50,  so  that  the  length 
of  the  annealing  process  simulated  would  be  roughly  the  same  as  for  the  other  penalty  methods. 
The  data  are  given  in  tables  in  Figure  5.2.4  and  are  plotted  in  graphs  in  Figure  5.2.5. 

Case  2  (n  =  240,  m  =  300,  K  -  2,  Oi  =  160,  a2  -  80):  In  this  case,  the  partition  is  required 
to  be  unbalanced.  For  all  penalty  methods,  we  ran  all  the  algorithms  with  R  equal  to  250.  We 
also  ran  the  algorithm  for  the  hardlimiter  penalty  method  with  R  equal  to  40,  so  that  the  length  of 
the  annealing  processes  simulated  would  be  roughly  the  same  as  for  the  other  penalty  methods. 
The  data  are  given  in  tables  in  Figure  5.2.6  and  plotted  in  graphs  in  Figure  5.2.7. 

Case  3  (n  =  160,  m  =  200,  K  =  4,  Oj  =  o2  =  03  =  O4  =  40): 

In  this  case,  the  partitions  should  be  balanced,  but  we  have  four  sets  to  the  partition  rather 
than  two  as  in  Case  1.  For  all  penalty  methods,  we  ran  all  the  algorithms  with  R  equal  to  75.  The 
data  are  given  in  tables  in  Figure  5.2.8  and  plotted  in  graphs  in  Figure  5.2.9. 

For  all  three  cases,  the  quality  of  solutions  for  the  quadratic  penalty  method,  the  dynamic 
method  of  multipliers,  and  the  method  of  multipliers  were  dependent  on  the  value  of  c.  Very  small 
values  of  c  lead  to  poor  solutions,  because  the  solutions  found  by  simulated  annealing  were  far 
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Figure  5.2.4.  Results  from  computer  experiments  for  Case  1  (n  =  240,  m  =  300,  K  =  2,  o,  =  07  = 
120).  AVGCAP  =  the  average  capacity  of  the  final  partition  and  AVG#TDEC  =  the  average 
number  of  times  T  was  decremented.  Q  =  quadratic  penalty  method;  MM  =  method  of  multipliers; 
DMM  =  dynamic  method  of  multipliers;  and  HL  =  hardlimiter  penalty  method. 
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Figure  5 .2.5.  Graphs  of  results  from  computer  experiments  for  Case  1  (n  =  240,  m  =  300,  K  =  2, 
Oj  =  o2  =  120).  AVGCAP  =  the  average  capacity  of  the  final  partition  and  AVG#TDEC  =  the 
average  number  of  times  T  was  decremented.  Q  =  quadratic  penalty  method;  MM  =  method  of 
multipliers;  DMM  =  dynamic  method  of  multipliers;  and  HL  =  hardlimiter  penalty  method. 
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Figure  5.2.6.  Results  from  computer  experiments  for  Case  2  (n  =  240,  m  =  300,  K  =  2,  Oj  =  160, 
o2  =  80).  AVGCAP  =  the  average  capacity  of  the  final  partition  and  AVG#TDEC  =  the  average 
number  of  times  T  was  decremented.  Q  =  quadratic  penalty  method;  MM  =  method  of  multipliers; 
DMM  =  dynamic  method  of  multipliers;  and  HL  =  hardlimiter  penalty  method. 
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Figure  5-2.7.  Graphs  of  results  from  computer  experiments  for  Case  2  (n  =  240,  m  =  300,  K  =  2, 
Oj  =  160,  o2  =  80).  AVGCAP  =  the  average  capacity  of  the  final  partition  and  AVGfTDEC  =  the 
average  number  of  times  T  was  decremented.  Q  =  quadratic  penalty  method;  MM  =  method  of 
multipliers;  DMM  =  dynamic  method  of  multipliers;  and  HL  =  hardlimiter  penalty  method. 
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Figure  5.2.8.  Results  from  computer  experiments  for  Case  3  (n  =  160,  m  =  200,  K  =  4,  Oj  =  o2  = 
o3  =  o4  =  40).  AVGCAP  =  the  average  capacity  of  the  final  partition  and  AVG#TDEC  =  the 
average  number  of  times  T  was  decremented.  Q  =  quadratic  penalty  method;  MM  =  method  of 
multipliers;  DMM  =  dynamic  method  of  multipliers;  and  HL  =  hardlimiter  penalty  method. 
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Figure  5.2.9.  Results  from  computer  experiments  for  Case  3  (n  =  160.  m  =  200,  K  =  4,  O]  =  o2  = 
o3  =  o4  =  40).  AVGCAP  =  the  average  capacity  of  the  final  partition  and  AVG#TDEC  =  the 
average  number  of  times  T  was  decremented.  Q  =  quadratic  penalty  method;  MM  =  method  of 
multipliers;  DMM  =  dynamic  method  of  multipliers;  and  HL  =  hardlimiter  penalty  method. 
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from  satisfying  the  equality  constraints  and  the  greedy  fix-up  algorithm  did  not  produce  near 
optimal  solutions.  Very  large  values  of  c  lead  to  poor  solutions,  because  the  large  quadratic  term 
in  the  augmented  Lagrangian  function  restricted  the  movement  of  the  annealing  process. 

For  all  three  cases,  the  dynamic  method  of  multipliers  produced  low-valued  AVGCAP  values 
over  the  widest  range  of  values  of  c  than  the  quadratic  penalty  method  or  the  method  of  multi¬ 
pliers.  In  Figure  5.2.5  note  that  when  c  equaled  0.001  the  dynamic  method  of  multipliers  produced 
a  smaller  value  of  AVGCAP  and  a  larger  value  of  AVG#TDEC  than  the  quadratic  penalty  method 
or  the  method  of  multipliers.  The  implication  is  that  there  may  be  a  quality  of  solution  versus  run¬ 
ning  time  tradeoff  in  Case  1.  However,  there  does  not  seem  to  be  such  a  tradeoff  in  the  other  two 
cases. 

Note  that  the  dynamic  method  of  multipliers  yields  better  solutions  than  the  quadratic  method 
for  small  values  of  c,  while  the  reverse  is  true  for  large  values  of  c.  This  can  partly  be  explained 
by  the  fact  that  in  the  dynamic  method  of  multiplier  the  linear  term  of  the  augmented  Lagrangian 
function  behaves  roughly  like  an  extra  quadratic  term  by  the  way  it  is  updated.  Hence,  the 
dynamic  method  of  multipliers  is  like  a  quadratic  penalty  method  but  with  a  larger  penalty  parame¬ 
ter. 

A  final  observation  is  that  the  hardlimiter  penalty  method  performed  well  when  compared  to 
the  other  penalty  methods.  For  Case  1,  the  AVGCAP  value  of  the  hardlimiter  method,  when  R 
equals  250,  is  smaller  than  the  other  penalty  methods,  and  the  AVGCAP  value  of  the  hardlimiter 
method,  when  R  equals  50,  is  within  4%  of  the  smallest  AVGCAP  value  of  the  other  methods. 
For  Case  2,  the  AVGCAP  value  of  the  hardlimiter  method,  when  R  equals  250,  is  within  8%  the 
smallest  AVGCAP  value,  and  the  AVGCAP  value  of  the  hardlimiter  method,  when  R  equals  40,  is 
within  11%  of  the  best  AVGCAP  value.  For  Case  3,  the  AVGCAP  value  of  the  hardlimiter 
penalty  method  is  within  4%  of  the  smallest  AVGCAP  value  of  the  other  penalty  methods. 
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Although  the  hardlimiter  penalty  method  performs  well  over  all  cases,  for  certain  values  of  c  the 
dynamic  method  of  multipliers  produces  better  solutions  for  Cases  2  and  3. 


We  will  digTess  to  make  a  comment  on  the  sensitivity  of  the  quadratic  penalty  method,  the 
method  of  multipliers,  and  the  dynamic  method  of  multipliers  to  the  value  of  c  in  our  experiments. 
The  reason  why  c  should  not  be  too  small  can  be  explained  if  we  assume  that  the  capacity  of  parti¬ 


tion  (V,, . .  .  ,VK)  is  well  approximated  by  (l-£ 
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)m,  which  is  the  average  capacity  of 


(V„  . .  .  ,Vk)  over  all  graphs  in  G(njn).  Then  the  augmented  Lagrangian  function  is  well 
approximated  by  r(glfg2, .  . .  ,gK) 
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where  gi  =  Ivj/n.  The  Hessian  matrix  of  T  is  (-2m+cn2)I,  where  1  is  the  identity  matrix.  There¬ 
fore,  T  is  convex  if  (£-^7-  and  concave  if  aS-~-.  In  Figures  5.2.5,  5.2.7,  and  5.2.9,  note  that  as 

rr  n2 

parameter  c  decreases,  the  quality  of  solutions  found  by  the  quadratic  penalty  method  begins  to 

degrade  after  c  crosses  the  value  of  (■=—■  is  equal  to  0.01  for  Cases  1  and  2  and  is  equal  to 

xr  n2 

0.016  for  Case  3). 


5.3.  Conclusions 

In  this  chapter,  we  considered  solving  the  equality  constrained  problem  by  simulated  anneal¬ 
ing  and  penalty  methods.  In  particular,  we  focused  on  the  quadratic  penalty  method  and  adaptive 
variations  of  it.  Our  concern  with  the  quadratic  penalty  method  was  that  resulting  solutions  were 
sensitive  to  its  penalty  parameter  values.  Since  simulated  annealing  is  a  very  time  consuming  algo¬ 
rithm,  finding  a  good  penalty  parameter  value  experimentally  could  be  impractical. 


This  leads  us  to  investigate  adaptive  penalty  methods  that  would  perhaps  be  less  sensitive  to 
parameter  values.  The  two  adaptive  penalty  methods  we  considered  were  based  on  the  method  of 
multipliers,  and  required  the  minimization  of  an  augmented  Lagrangian  function.  However,  the 
multiplier  vector  of  the  function  is  dynamically  and  periodically  adjusted.  Of  these  two  adaptive 
penalty  methods  our  experiments  showed  that  the  dynamic  method  of  multipliers  worked  best  and 
produced  low-valued  solutions  over  a  wider  range  of  penalty  parameter  values  than  the  quadratic 
penalty  method.  We  think  further  investigation  of  adaptive  penalty  methods  should  be  done. 


CHAPTER  6 


CONCLUSIONS 

6.1.  Summary  of  Thesis 

This  thesis  consists  of  a  collection  of  results,  most  of  which  concern  the  finite-time  behavior 
of  simulated  annealing. 

In  Chapters  2  and  3,  we  analyzed  simulated  annealing  when  it  is  applied  to  the  matching 
problem.  In  Chapter  2,  we  showed  that  in  the  worst  case  simulated  annealing  solves  the  problem 
in  average  time  that  is  at  least  exponential  in  the  number  of  nodes  of  the  graph  if  (a)  the  simulated 
annealing  algorithm  is  the  basic  simulated  annealing  algorithm  in  Subsection  2.2.1  or  (b)  if  we  res¬ 
trict  our  attention  to  the  constant-temperature  schedules.  An  upper  bound  on  the  average  time  it 
takes  the  basic  simulated  annealing  algorithm  of  Subsection  2.2.1  to  find  a  near  maximum  match¬ 
ing  is  also  given.  If  we  only  require  the  algorithm  to  find  a  matching  of  size  at  least  a  fixed  frac¬ 
tion  of  the  size  of  the  maximum  matching,  this  upper  bound  is  polynomial  in  the  number  of  nodes 
of  the  graph.  An  estimate  on  the  average  time  the  basic  simulated  annealing  of  Subsection  2.2.1 
will  solve  the  matching  problem  for  a  "typical"  graph  is  given  in  Chapter  3.  We  also  presented 
computer  simulation  data  that  demonstrated  that  this  estimate  was  reasonable.  If  we  restrict  our 
attention  to  graphs  that  have  at  least  as  many  edges  as  there  are  nodes,  then  the  estimate  is  a  poly¬ 
nomial  function  of  the  number  of  nodes  of  the  graph,  which  contrasts  the  results  in  Chapter  2  and 
which  is  encouraging  to  proponents  of  simulated  annealing. 

Since  there  are  efficient  algorithms  available  to  solve  the  matching  problem  [9],  it  is  doubtful 
that  simulated  annealing  will  be  used  on  that  problem  in  practice.  However,  our  analysis  seem  to 
be  the  first  thorough  theoretical  analysis  of  the  average  time  complexity  of  simulated  annealing 
applied  to  a  nontrivial  combinatorial  optimization  problem.  Also,  we  can  at  least  test  our  intuition 


and  experience  against  these  results.  For  example,  since  simulated  annealing  is  a  simple  heuristic, 
we  would  not  expect  it  to  outperform  more  sophisticated  methods  in  the  worst  case,  and  our 
exponential  average-time  lower  bounds  compared  with  the  0(V[V|  |E|)  time  algorithm  of  [9]  cer¬ 
tainly  supports  this  expectation. 

In  Chapter  4,  we  presented  a  collection  of  results.  The  template  method  and  some  examples 
of  its  use  were  given  in  Section  4.2,  and  the  threshold  random  search  algorithm  was  given  in  Sec¬ 
tion  4.3.  In  Section  4.4,  we  presented  conditions  which  imply  that  no  monotone  decreasing  tem¬ 
perature  schedule  is  optimal. 

The  use  of  adaptive  penalty  methods  to  solve  equality  constrained  problems  by  simulated 
annealing  was  investigated  in  Chapter  5.  One  of  these  methods  (the  dynamic  method  of  multi¬ 
pliers)  was  shown,  through  experiments,  to  provide  low-valued  solutions  over  a  wider  range  of 
parameter  values  than  the  static  penalty  method  (quadratic  penalty  method)  we  considered.  We 
believe  further  study  of  adaptive  penalty  methods  should  be  done. 

6.2.  Directions  for  Future  Research  on  Simulated  Annealing 

One  direction  for  future  research  is  to  extend  the  results  of  Section  2.4.  Rather  than  restrict¬ 
ing  ourselves  to  constant  temperature  schedules  we  may  consider  monotone  decreasing  temperature 
schedules.  Since  we  are  relaxing  the  constraints  for  the  temperature  schedules,  we  will  have  to 
consider  a  smaller  set  of  transition  probability  matrices  than  the  set  R(G)  of  Section  2.4  in  order  to 
maintain  the  same  lower  bound  for  Q. 

A  second  direction  for  future  research  is  to  use  a  "physicist’s"  approach  to  analyze  the 
behavior  of  simulated  annealing  for  some  NP-Complete  problem.  What  we  mean  by  a  physicist’s 
approach  is  to  make  reasonable  assumptions  to  model  the  annealing  process  by  a  process  that  can 
be  analyzed.  Then  experiments  should  be  used  to  check  the  accuracy  of  the  model.  An  example 
of  this  type  of  analysis,  but  on  a  known  polynomial-time  problem,  was  done  in  Chapter  3. 
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Consider  the  system  (S,  c,  R),  where  S  is  a  set  of  states,  c  is  a  cost  function  on  S,  and  R  is  a 
probability  transition  matrix  over  S.  Let  (Xk:  k  £  0)  be  an  annealing  on  the  system  (S,  c,  R)  with 
temperature  schedule  T.  Most  theoretical  research  is  concerned  with  determining  a  good  tempera¬ 
ture  schedule.  However,  it  may  be  that  the  performance  of  simulated  annealing  is  more  dependent 
on  the  choice  of  R.  A  third  direction  for  future  research  is  to  study  how  the  choice  of  R  affects 
the  performance  of  simulated  annealing.  A  related  direction  for  future  research  is  to  come  up  with 
ways  to  systematically  modify  R  that  may  improve  the  performance  of  the  simulated  annealing 
algorithm.  For  example,  using  R2  rather  than  R  may  be  preferable  in  certain  cases.  The  template 
method  may  be  used  to  illustrate  these  cases. 
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